On the Inductive Bias of Stacking Towards Improving Reasoning

May-30-2025, 16:06:51 GMT–Neural Information Processing Systems

Given the increasing scale of model sizes, efficient training strategies like gradual stacking [Gong et al., 2019, Reddi et al., 2023] have garnered interest. Stacking enables efficient training by gradually growing the depth of a model in stages and using layers from a smaller model in an earlier stage to initialize the next stage. Although efficient for training, the model biases induced by such growing approaches are largely unexplored. In this work, we examine this fundamental aspect of gradual stacking, going beyond its efficiency benefits.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

May-30-2025, 16:06:51 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.67)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)