How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD
–Neural Information Processing Systems
However, in terms of making the gradients small, the original SGD does not give an optimal rate, even when f(x) is convex. If f(x) is convex, to find a point with gradient norm ε, we design an algorithm SGD3withanear-optimalrate eO(ε 2),improvingthebestknownrateO(ε 8/3) of [17].
Neural Information Processing Systems
Feb-13-2026, 21:02:09 GMT
- Country:
- North America
- Canada (0.04)
- United States > Massachusetts
- Middlesex County > Cambridge (0.04)
- North America
- Technology: