Stagewise Training Accelerates Convergence of Testing Error Over SGD
–Neural Information Processing Systems
Stagewise training strategy is widely used for learning neural networks, which runs a stochastic algorithm (e.g., SGD) starting with a relatively large step size (aka learning rate) and geometrically decreasing the step size after a number of iterations. It has been observed that the stagewise SGD has much faster convergence than the vanilla SGD with a polynomially decaying step size in terms of both training error and testing error.
Neural Information Processing Systems
Dec-26-2025, 04:35:40 GMT
- Technology: