Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
–Neural Information Processing Systems
As both model and dataset sizes continue to scale rapidly, conventional pretraining strategies with fixed compute budgets--such as cosine learning rate schedules--are increasingly inadequate for large-scale training.
Neural Information Processing Systems
Jun-14-2026, 00:54:44 GMT
- Technology: