On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Mertikopoulos, Panayotis, Hallak, Nadav, Kavis, Ali, Cevher, Volkan

Jun-19-2020–arXiv.org Machine Learning

This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is $\mathcal{O}(1/n^{p})$ if the method is employed with a $\Theta(1/n^p)$ step-size schedule. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic using ResNet architectures on CIFAR.

artificial intelligence, machine learning, probability 1, (15 more...)

arXiv.org Machine Learning

Jun-19-2020

arXiv.org PDF

Add feedback

Country:
- Europe (0.67)
- North America > United States
  - New York (0.14)

Genre:
- Overview (1.00)
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found