How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Feb-14-2020, 07:29:04 GMT–Neural Information Processing Systems

Stochastic gradient descent (SGD) gives an optimal convergence rate when minimizing convex stochastic objectives $f(x)$. However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when $f(x)$ is convex. If $f(x)$ is convex, to find a point with gradient norm $\varepsilon$, we design an algorithm SGD3 with a near-optimal rate $\tilde{O}(\varepsilon {-2})$, improving the best known rate $O(\varepsilon {-8/3})$. This is no slower than the best known stochastic version of Newton's method in all parameter regimes. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, machine learning, varepsilon, (3 more...)

Neural Information Processing Systems

Feb-14-2020, 07:29:04 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)