Momentum-Based Variance Reduction in Non-Convex SGD
Ashok Cutkosky, Francesco Orabona
–Neural Information Processing Systems
Variance reduction has emerged in recent years as a strong competitor to stochastic gradient descent in non-convex problems, providing the first algorithms to improve upon the converge rate of stochastic gradient descent for finding first-order critical points. However, variance reduction techniques typically require carefully tuned learning rates and willingness to use excessively large "mega-batches" in order to achieve their improved results.
Neural Information Processing Systems
Feb-13-2026, 18:55:59 GMT
- Country:
- Technology: