How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Zeyuan Allen-Zhu

Neural Information Processing Systems 

However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when f(x) is convex. If f(x) is convex, to find a point with gradient norm ε, we design an algorithm SGD3withanear-optimalrate eO(ε 2),improvingthebestknownrateO(ε 8/3) of [17].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found