Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
–Neural Information Processing Systems
LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical Obstacles: (O1) lack of comprehensive evaluation, (O2) untested viability for scaling, and (O3) lack of empirical guidelines. To tackle O1, we summarize existing approaches into four atomic growth operators and systematically evaluate them in a standardized LLM pre-training setting.
Neural Information Processing Systems
May-28-2025, 12:57:44 GMT
- Country:
- Asia (0.14)
- Genre:
- Research Report
- Experimental Study (0.92)
- New Finding (0.92)
- Research Report
- Industry:
- Information Technology (0.45)
- Technology: