Goto

Collaborating Authors

 step decay


OntheConvergenceofStepDecayStep-Sizefor StochasticOptimization

Neural Information Processing Systems

Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. Weprovide convergence results for step decay in the non-convexregime, ensuring that the gradient norm vanishes at an O(lnT/ T)rate.