Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
–Neural Information Processing Systems
As both model and dataset sizes continue to scale rapidly, conventional pretraining strategies with fixed compute budgets--such as cosine learning rate schedules--are increasingly inadequate for large-scale training.
Neural Information Processing Systems
Jun-22-2026, 08:12:42 GMT