Goto

Collaborating Authors

 natasha1


Natasha 2: Faster Non-Convex Optimization Than SGD

Neural Information Processing Systems

In diverse world of deep learning research has given rise to numerous architectures for neural networks(convolutionalones,longshorttermmemoryones,etc). However,tothisdate,theunderlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants.


Natasha 2: Faster Non-Convex Optimization Than SGD

Neural Information Processing Systems

In diverse world of deep learning research has given rise to numerous architectures for neural networks (convolutional ones, long short term memory ones, etc). However, to this date, the underlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants. In this paper, we address the problem of designing a new algorithm that has provably faster running time than the best known result for SGD.


Natasha 2: Faster Non-Convex Optimization Than SGD

Neural Information Processing Systems

In diverse world of deep learning research has given rise to numerous architectures for neural networks (convolutional ones, long short term memory ones, etc). However, to this date, the underlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants. In this paper, we address the problem of designing a new algorithm that has provably faster running time than the best known result for SGD.


Natasha 2: Faster Non-Convex Optimization Than SGD

Neural Information Processing Systems

In diverse world of deep learning research has given rise to numerous architectures for neural networks (convolutionalones, long short term memory ones, etc). However, to this date, the underlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants. In this paper, we address the problem of designing a new algorithm that has provably faster running time than the best known result for SGD.


Natasha 2: Faster Non-Convex Optimization Than SGD

arXiv.org Machine Learning

We design a stochastic algorithm to train any smooth neural network to $\varepsilon$-approximate local minima, using $O(\varepsilon^{-3.25})$ backpropagations. The best result was essentially $O(\varepsilon^{-4})$ by SGD. More broadly, it finds $\varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(\varepsilon^{-3.25})$, with only oracle access to stochastic gradients.


Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter

arXiv.org Machine Learning

Given a nonconvex function $f(x)$ that is an average of $n$ smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue $-\sigma$ of the Hessian. This parameter $\sigma$ captures how strongly nonconvex $f(x)$ is, and is analogous to the strong convexity parameter for convex optimization. At least in theory, our methods outperform known (offline) methods for a range of parameter $\sigma$, and can also be used to find approximate local minima. Our result implies an interesting dichotomy: there exists a threshold $\sigma_0$ so that the currently fastest methods for $\sigma>\sigma_0$ and for $\sigma<\sigma_0$ have different behaviors: the former scales with $n^{2/3}$ and the latter scales with $n^{3/4}$.