AITopics

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Neural Information Processing SystemsNov-20-2025, 17:36:29 GMT

Natasha 2: Faster Non-Convex Optimization Than SGD

Zeyuan Allen-Zhu

In diverse world of deep learning research has given rise to numerous architectures for neural networks (convolutional ones, long short term memory ones, etc). However, to this date, the underlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants. In this paper, we address the problem of designing a new algorithm that has provably faster running time than the best known result for SGD.

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Natasha 2: Faster Non-Convex Optimization Than SGD

Neural Information Processing SystemsDec-31-2018

algorithm, artificial intelligence, machine learning, (18 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Natasha 2: Faster Non-Convex Optimization Than SGD

Neural Information Processing SystemsDec-31-2018

In diverse world of deep learning research has given rise to numerous architectures for neural networks (convolutionalones, long short term memory ones, etc). However, to this date, the underlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants. In this paper, we address the problem of designing a new algorithm that has provably faster running time than the best known result for SGD.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Natasha 2: Faster Non-Convex Optimization Than SGD

arXiv.org Machine LearningFeb-23-2018

We design a stochastic algorithm to train any smooth neural network to $\varepsilon$-approximate local minima, using $O(\varepsilon^{-3.25})$ backpropagations. The best result was essentially $O(\varepsilon^{-4})$ by SGD. More broadly, it finds $\varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(\varepsilon^{-3.25})$, with only oracle access to stochastic gradients.

artificial intelligence, machine learning, natasha1, (20 more...)

arXiv.org Machine Learning

1708.08694

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter

arXiv.org Machine LearningSep-5-2017

Given a nonconvex function $f(x)$ that is an average of $n$ smooth functions, we design stochastic first-order methods to find its approximate stationary points. The performance of our new methods depend on the smallest (negative) eigenvalue $-\sigma$ of the Hessian. This parameter $\sigma$ captures how strongly nonconvex $f(x)$ is, and is analogous to the strong convexity parameter for convex optimization. At least in theory, our methods outperform known (offline) methods for a range of parameter $\sigma$, and can also be used to find approximate local minima. Our result implies an interesting dichotomy: there exists a threshold $\sigma_0$ so that the currently fastest methods for $\sigma>\sigma_0$ and for $\sigma<\sigma_0$ have different behaviors: the former scales with $n^{2/3}$ and the latter scales with $n^{3/4}$.

artificial intelligence, machine learning, stationary point, (17 more...)

arXiv.org Machine Learning

1702.00763

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)