eae3af0f5868f0a2eceb74208966d55b-Paper-Conference.pdf
–Neural Information Processing Systems
Modern LLMs are increasingly deep, and depth correlates with performance, albeit with diminishing returns. However, do these models use their depth efficiently? Do they compose more features to create higher-order computations that are impossible in shallow models, or do they merely spread the same kinds of computation out over more layers? To address these questions, we analyze the residual stream of the Llama 3.1, Qwen 3, and OLMo 2 family of models. We find: First, comparing the output of the sublayers to the residual stream reveals that layers in the second half contribute much less than those in the first half, with a clear phase transition between the two halves.
Neural Information Processing Systems
Jun-23-2026, 01:21:53 GMT
- Country:
- Asia (1.00)
- Europe (0.67)
- North America > United States
- California > Los Angeles County (0.27)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Technology: