How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Dec-31-2018–Neural Information Processing Systems

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$. However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is convex. If $f(x)$ is convex, to find a point with gradient norm $\varepsilon$, we design an algorithm SGD3 with a near-optimal rate $\tilde{O}(\varepsilon^{-2})$, improving the best known rate $O(\varepsilon^{-8/3})$. If $f(x)$ is nonconvex, to find its $\varepsilon$-approximate local minimum, we design an algorithm SGD5 with rate $\tilde{O}(\varepsilon^{-3.5})$, where previously SGD variants only achieve $\tilde{O}(\varepsilon^{-4})$. This is no slower than the best known stochastic version of Newton's method in all parameter regimes.

artificial intelligence, convex, machine learning, (16 more...)

Neural Information Processing Systems

Dec-31-2018

Conferences PDF

Add feedback

Country:
- North America
  - United States
    - Washington > King County
      - Redmond (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Asia > Middle East
  - Jordan (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Duplicate Docs Excel Report

Title
How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD
How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Similar Docs Excel Report more

Title	Similarity	Source
None found