How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Open in new window