On exponential convergence of SGD in non-convex over-parametrized learning

Bassily, Raef, Belkin, Mikhail, Ma, Siyuan

Nov-5-2018–arXiv.org Machine Learning

Stochastic Gradient Descent and its variants have become a staple of the algorithmic foundations of machine learning. Yet many of its properties are not fully understood, particularly in non-convex settings common in modern practice. In this note, we study convergence of Stochastic Gradient Descent (SGD) for the class of functions satisfying the Polyak-Lojasiewicz (PL) condition. This class contains all strongly-convex functions as well as a broad range of non-convex functions including those used in machine learning applications (see the discussion below). The primary purpose of this note is to show that in the interpolation setting (common in modern overparametrized machine learning and studied in our previous work [8]) SGD with fixed step size has exponential convergence for the functions satisfying the PL condition. To the best of our knowledge, this is the first such exponential convergence result for a class of non-convex functions. Below, we discuss and highlight a number of aspects of the PL condition which differentiate it from the convex setting and make it more relevant to the practice and requirements of many machine learning problems.

artificial intelligence, exponential convergence, machine learning, (16 more...)

arXiv.org Machine Learning

Nov-5-2018

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found