On the Inductive Bias of Stacking Towards Improving Reasoning

Neural Information Processing Systems 

In this work, we examine this fundamental aspect of gradual stacking, going beyond its efficiency benefits.