Counterfactual reasoning: an analysis of in-context emergence
Miller, Moritz, Schölkopf, Bernhard, Guo, Siyuan
–arXiv.org Artificial Intelligence
Large-scale neural language models exhibit remarkable performance in in-context learning: the ability to learn and reason about the input context on the fly. This work studies in-context counterfactual reasoning in language models, that is, the ability to predict consequences of a hypothetical scenario. We focus on a well-defined, synthetic linear regression task that requires noise abduction. Accurate prediction is based on (1) inferring an unobserved latent concept and (2) copying contextual noise from factual observations. We show that language models are capable of counterfactual reasoning. Further, we enhance existing identifiability results and reduce counterfactual reasoning for a broad class of functions to a transformation on in-context observations. In Transformers, we find that self-attention, model depth and pre-training data diversity drive performance. Moreover, we provide mechanistic evidence that the latent concept is linearly represented in the residual stream and we introduce designated \textit{noise abduction heads} central to performing counterfactual reasoning. Lastly, our findings extend to counterfactual reasoning under SDE dynamics and reflect that Transformers can perform noise abduction on sequential data, providing preliminary evidence on the potential for counterfactual story generation. Our code is available under https://github.com/mrtzmllr/iccr.
arXiv.org Artificial Intelligence
Oct-22-2025
- Country:
- Asia
- Europe
- Germany
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- North America
- Canada > Ontario
- Toronto (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Florida > Palm Beach County
- Boca Raton (0.04)
- Oregon > Benton County
- Corvallis (0.04)
- California > San Diego County
- Canada > Ontario
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Health & Medicine (0.67)
- Information Technology > Security & Privacy (0.67)
- Technology: