Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

May-28-2025, 12:57:44 GMT–Neural Information Processing Systems

LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical Obstacles: (O1) lack of comprehensive evaluation, (O2) untested viability for scaling, and (O3) lack of empirical guidelines. To tackle O1, we summarize existing approaches into four atomic growth operators and systematically evaluate them in a standardized LLM pre-training setting.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

May-28-2025, 12:57:44 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.14)

Genre:
- Research Report
  - Experimental Study (0.92)
  - New Finding (0.92)

Industry:
- Information Technology (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.92)
  - Natural Language > Large Language Model (1.00)