The Sample Complexity of Gradient Descent in Stochastic Convex Optimization

Livni, Roi

arXiv.org Artificial Intelligence 

Stochastic Convex Optimization (SCO) is a theoretical model that depicts a learner that minimizes a (Lipschitz) convex function, given finite noisy observations of the objective [22]. While often considered simplistic, in recent years SCO has become a focus of theoretical research, partly, because of its importance to the study of first-order optimization methods. But, also, it has become focus of study because it is one of few theoretical settings that exhibit overparameterized learning. In more detail, classical learning theory often focuses on the tension between number of samples, or training data, and the complexity of the model to be learnt. A common wisdom of classical theories [1, 7, 14, 24] is that, to avoid overfitting, the complexity of a model should be adjusted in proportion to the amount of training data. However, recent advances in Machine Learning have challenged this viewpoint. Evidently [18, 25], state-of-the-art algorithms generalize well but without, explicitly, controlling the capacity of the model to be learnt. In turn, today, it is one of the most emerging challenges, for learning theory, to understand learnability when the number of parameters in a learnt model exceeds the number of examples, and when, seemingly, nothing withholds the algorithm from overfitting. Towards this, we look at SCO.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found