OntheConvergenceofStepDecayStep-Sizefor StochasticOptimization
–Neural Information Processing Systems
Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. Weprovide convergence results for step decay in the non-convexregime, ensuring that the gradient norm vanishes at an O(lnT/ T)rate.
Neural Information Processing Systems
Feb-9-2026, 10:24:17 GMT