Understanding In-context Learning of Addition via Activation Subspaces

Hu, Xinyan, Yin, Kayo, Jordan, Michael I., Steinhardt, Jacob, Chen, Lijie

Oct-10-2025–arXiv.org Artificial Intelligence

To perform few-shot learning, language models extract signals from a few input-label pairs, aggregate these into a learned prediction rule, and apply this rule to new inputs. How is this implemented in the forward pass of modern transformer models? To explore this question, we study a structured family of few-shot learning tasks for which the true prediction rule is to add an integer $k$ to the input. We introduce a novel optimization method that localizes the model's few-shot ability to only a few attention heads. We then perform an in-depth analysis of individual heads, via dimensionality reduction and decomposition. As an example, on Llama-3-8B-instruct, we reduce its mechanism on our tasks to just three attention heads with six-dimensional subspaces, where four dimensions track the unit digit with trigonometric functions at periods $2$, $5$, and $10$, and two dimensions track magnitude with low-frequency components. To deepen our understanding of the mechanism, we also derive a mathematical identity relating ``aggregation'' and ``extraction'' subspaces for attention heads, allowing us to track the flow of information from individual examples to a final aggregated concept. Using this, we identify a self-correction mechanism where mistakes learned from earlier demonstrations are suppressed by later demonstrations. Our results demonstrate how tracking low-dimensional subspaces of localized heads across a forward pass can provide insight into fine-grained computational structures in language models.

accuracy, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

Oct-10-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Minnesota (0.28)

Genre:
- Research Report > New Finding (0.86)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.94)
  - Representation & Reasoning > Optimization (0.66)
  - Machine Learning
    - Statistical Learning (0.66)
    - Neural Networks > Deep Learning (0.39)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found