Natasha 2: Faster Non-Convex Optimization Than SGD

Neural Information Processing Systems 

We design a stochastic algorithm to find $\varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(\varepsilon^{-3.25})$,