Train faster, generalize better: Stability of stochastic gradient descent

Hardt, Moritz, Recht, Benjamin, Singer, Yoram

Feb-7-2016–arXiv.org Machine Learning

The most widely used optimization method in machine learning practice is stochastic gradient method (SGM). Stochastic gradient methods aim to minimize the empirical risk of a model by repeatedly computing the gradient of a loss function on a single training example, or a batch of few examples, and updating the model parameters accordingly. SGM is scalable, robust, and performs well across many different domains ranging from smooth and strongly convex problems to complex non-convex objectives. In a nutshell, our results establish that: Any model trained with stochastic gradient method in a reasonable amount of time attains small generalization error. As training time is inevitably limited in practice, our results help to explain the strong generalization performance of stochastic gradient methods observed in practice. More concretely, we bound the generalization error of a model in terms of the number of iterations that stochastic gradient method took in order to train the model. Our main analysis tool is to employ the notion of algorithmic stability due to Bousquet and Elisseeff [4].

artificial intelligence, machine learning, stability, (14 more...)

arXiv.org Machine Learning

Feb-7-2016

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Statistical Learning
    - Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found