OntheConvergenceofStepDecayStep-Sizefor StochasticOptimization

Neural Information Processing Systems 

Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. Weprovide convergence results for step decay in the non-convexregime, ensuring that the gradient norm vanishes at an O(lnT/ T)rate.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found