Stagewise Training Accelerates Convergence of Testing Error Over SGD

Dec-26-2025, 04:35:40 GMT–Neural Information Processing Systems

Stagewise training strategy is widely used for learning neural networks, which runs a stochastic algorithm (e.g., SGD) starting with a relatively large step size (aka learning rate) and geometrically decreasing the step size after a number of iterations. It has been observed that the stagewise SGD has much faster convergence than the vanilla SGD with a polynomially decaying step size in terms of both training error and testing error.

name change, stagewise training accelerate convergence, testing error, (9 more...)

Neural Information Processing Systems

Dec-26-2025, 04:35:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.43)