Natasha 2: Faster Non-Convex Optimization Than SGD

Feb-14-2020, 11:13:15 GMT–Neural Information Processing Systems

We design a stochastic algorithm to find $\varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(\varepsilon {-3.25})$, with only oracle access to stochastic gradients. The best result before this work was $O(\varepsilon {-4})$ by stochastic gradient descent (SGD). Papers published at the Neural Information Processing Systems Conference.

faster non-convex optimization, natasha 2, varepsilon, (1 more...)

Neural Information Processing Systems

Feb-14-2020, 11:13:15 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)