Decomposing Attention To Find Context-Sensitive Neurons
–arXiv.org Artificial Intelligence
We study transformer language models, analyzing attention heads whose attention patterns are spread out, and whose attention scores depend weakly on content. We argue that the softmax denominators of these heads are stable when the underlying token distribution is fixed. By sampling softmax denominators from a "calibration text", we can combine together the outputs of multiple such stable heads in the first layer of GPT2-Small, approximating their combined output by a linear summary of the surrounding text. This approximation enables a procedure where from the weights alone - and a single calibration text - we can uncover hundreds of first layer neurons that respond to high-level contextual properties of the surrounding text, including neurons that didn't activate on the calibration text.
arXiv.org Artificial Intelligence
Oct-7-2025
- Country:
- Africa > Middle East
- Egypt (0.04)
- Asia
- China
- Beijing > Beijing (0.04)
- Shanghai > Shanghai (0.04)
- Tibet Autonomous Region (0.04)
- India (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Middle East
- Iran (0.04)
- Iraq (0.04)
- Israel (0.28)
- Palestine (0.04)
- Republic of Türkiye > Batman Province
- Batman (0.04)
- Syria (0.04)
- North Korea > Pyongyang
- Pyongyang (0.04)
- Russia (0.04)
- Taiwan (0.04)
- China
- Europe
- Italy > Tuscany
- Florence (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Spain > Galicia
- Madrid (0.04)
- United Kingdom
- England > Cambridgeshire
- Cambridge (0.04)
- Scotland (0.04)
- Wales (0.04)
- England > Cambridgeshire
- Italy > Tuscany
- North America
- Dominican Republic (0.04)
- Mexico (0.04)
- United States
- California > Riverside County
- Banning (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Iowa (0.04)
- Oregon (0.04)
- Texas (0.04)
- California > Riverside County
- Oceania > Australia (0.04)
- South America
- Africa > Middle East
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Banking & Finance > Trading (0.68)
- Education (1.00)
- Government
- Health & Medicine
- Law (1.00)
- Information Technology (1.00)
- Leisure & Entertainment
- Games > Computer Games (1.00)
- Sports > Football (0.67)
- Media > Television (1.00)
- Law Enforcement & Public Safety (1.00)
- Technology: