Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

May-2-2017–arXiv.org Machine Learning

In large-scale machine learning, the number of data examples is usually very large. To search for the optimal solution, one often uses stochastic gradient methods which only require one (or a small batch of) random example(s) per iteration in order to form an estimator of the full gradient. While full-gradient based methods can enjoy an accelerated (and optimal) convergence rate if Nesterov's momentum trick is used [35-37], theory for stochastic gradient methods are generally lagging behind and less is known for their acceleration. At a high level, momentum is dangerous if stochastic gradients are present. If some gradient estimator is very inaccurate, then adding it to the momentum and moving further in this direction (for every future iteration) may hurt the convergence performance. In other words, when naively equipped with momentum, stochastic gradient methods are "very prune to error accumulation" [25] and do not yield accelerated convergence rates in general.

artificial intelligence, katyusha, machine learning, (15 more...)

arXiv.org Machine Learning

May-2-2017

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Statistical Learning
    - Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found