Goto

Collaborating Authors

 Statistical Learning





How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Neural Information Processing Systems

However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when f(x) is convex. If f(x) is convex, to find a point with gradient norm ε, we design an algorithm SGD3withanear-optimalrate eO(ε 2),improvingthebestknownrateO(ε 8/3) of [17].