eae3af0f5868f0a2eceb74208966d55b-Paper-Conference.pdf

Jun-23-2026, 01:21:53 GMT–Neural Information Processing Systems

Modern LLMs are increasingly deep, and depth correlates with performance, albeit with diminishing returns. However, do these models use their depth efficiently? Do they compose more features to create higher-order computations that are impossible in shallow models, or do they merely spread the same kinds of computation out over more layers? To address these questions, we analyze the residual stream of the Llama 3.1, Qwen 3, and OLMo 2 family of models. We find: First, comparing the output of the sublayers to the residual stream reveals that layers in the second half contribute much less than those in the first half, with a clear phase transition between the two halves.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Jun-23-2026, 01:21:53 GMT

Conferences PDF

Add feedback

Country:
- Asia (1.00)
- Europe (0.67)
- North America > United States
  - California > Los Angeles County (0.27)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.89)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found