Train faster, generalize better: Stability of stochastic gradient descent

Hardt, Moritz, Recht, Benjamin, Singer, Yoram

arXiv.org Machine Learning 

The most widely used optimization method in machine learning practice is stochastic gradient method (SGM). Stochastic gradient methods aim to minimize the empirical risk of a model by repeatedly computing the gradient of a loss function on a single training example, or a batch of few examples, and updating the model parameters accordingly. SGM is scalable, robust, and performs well across many different domains ranging from smooth and strongly convex problems to complex non-convex objectives. In a nutshell, our results establish that: Any model trained with stochastic gradient method in a reasonable amount of time attains small generalization error. As training time is inevitably limited in practice, our results help to explain the strong generalization performance of stochastic gradient methods observed in practice. More concretely, we bound the generalization error of a model in terms of the number of iterations that stochastic gradient method took in order to train the model. Our main analysis tool is to employ the notion of algorithmic stability due to Bousquet and Elisseeff [4].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found