Natasha 2: Faster Non-Convex Optimization Than SGD

Allen-Zhu, Zeyuan

Neural Information Processing Systems 

In diverse world of deep learning research has given rise to numerous architectures for neural networks (convolutional ones, long short term memory ones, etc). However, to this date, the underlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants. In this paper, we address the problem of designing a new algorithm that has provably faster running time than the best known result for SGD.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found