Stagewise Training Accelerates Convergence of Testing Error Over SGD

Zhuoning Yuan, Yan Yan, Rong Jin, Tianbao Yang

Neural Information Processing Systems 

But how to explain this phenomenon has been largely ignored by existing studies.