Natasha 2: Faster Non-Convex Optimization Than SGD
–Neural Information Processing Systems
In diverse world of deep learning research has given rise to numerous architectures for neural networks (convolutional ones, long short term memory ones, etc). However, to this date, the underlying training algorithms for neural networks are still stochastic gradient descent (SGD) and its heuristic variants. In this paper, we address the problem of designing a new algorithm that has provably faster running time than the best known result for SGD.
Neural Information Processing Systems
Dec-31-2018