Natasha 2: Faster Non-Convex Optimization Than SGD

Feb-23-2018–arXiv.org Machine Learning

We design a stochastic algorithm to train any smooth neural network to $\varepsilon$-approximate local minima, using $O(\varepsilon^{-3.25})$ backpropagations. The best result was essentially $O(\varepsilon^{-4})$ by SGD. More broadly, it finds $\varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(\varepsilon^{-3.25})$, with only oracle access to stochastic gradients.

artificial intelligence, machine learning, natasha1, (20 more...)

arXiv.org Machine Learning

Feb-23-2018

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Statistical Learning > Gradient Descent (0.50)
    - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found