How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
Luo, Kairong, Sun, Zhenbo, Wen, Haodong, Shi, Xinyu, Cui, Jiarui, Dang, Chenyi, Lyu, Kaifeng, Chen, Wenguang
–arXiv.org Artificial Intelligence
Due to the scarcity of high-quality data, large language models (LLMs) are often trained on mixtures of data with varying quality levels, even after sophisticated data curation. A natural approach to better leverage high-quality data is curriculum-based pretraining, where the model is trained on data sorted in ascending order of quality as determined by a quality metric. However, prior studies have reported limited improvements from such curriculum-based pretraining strategies. This work identifies a critical factor constraining these methods: the incompatibility between the ascending data quality order and the decaying learning rate (LR) schedule. We find that while curriculum-based training substantially outperforms random shuffling when using a constant LR, its advantage diminishes under standard LR decay schedules. Our experiments show this incompatibility can be mitigated by two simple strategies: (1) employing a more moderate LR decay schedule, where the final LR is only moderately smaller than the peak LR, and (2) replacing LR decay with model averaging, i.e., computing a weighted average of the final few checkpoints. By combining these strategies, we improve the average score on a suite of standard benchmarks by 1.64% over random shuffling, without additional data refinement. Validated on 1.5B-parameter models trained over 30B tokens with various data-quality metrics, our findings call for a re-evaluation of curriculum-based LLM pretraining and underscore the potential of co-designing data curricula with optimization methods.
arXiv.org Artificial Intelligence
Nov-25-2025
- Country:
- Africa > Rwanda
- Asia
- China > Hong Kong (0.04)
- Middle East > Jordan (0.04)
- Singapore (0.04)
- Europe
- North America
- Canada > British Columbia
- Vancouver (0.04)
- United States
- California
- Monterey County > Monterey (0.04)
- San Diego County > San Diego (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Virginia (0.04)
- California
- Canada > British Columbia
- Genre:
- Research Report > New Finding (1.00)
- Technology: