How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Zeyuan Allen-Zhu

Neural Information Processing Systems 

Neural Information Processing Systems http://nips.cc/