Sparse Feature Coactivation Reveals Causal Semantic Modules in Large Language Models
Deng, Ruixuan, Hu, Xiaoyang, Gilberti, Miles, Storks, Shane, Taxali, Aman, Angstadt, Mike, Sripada, Chandra, Chai, Joyce
–arXiv.org Artificial Intelligence
We identify semantically coherent, context-consistent network components in large language models (LLMs) using coactivation of sparse autoencoder (SAE) features collected from just a handful of prompts. Focusing on concept-relation prediction tasks, we show that ablating these components for concepts (e.g., countries and words) and relations (e.g., capital city and translation language) changes model outputs in predictable ways, while amplifying these components induces counterfactual responses. Notably, composing relation and concept components yields compound counterfactual outputs. Further analysis reveals that while most concept components emerge from the very first layer, more abstract relation components are concentrated in later layers. Lastly, we show that extracted components more comprehensively capture concepts and relations than individual features while maintaining specificity. Overall, our findings suggest a modular organization of knowledge accessed through compositional operations, and advance methods for efficient, targeted LLM manipulation.
arXiv.org Artificial Intelligence
Oct-22-2025
- Country:
- Africa
- Niger (0.04)
- Nigeria
- Federal Capital Territory > Abuja (0.05)
- Lagos State > Lagos (0.04)
- Asia
- China
- Japan (0.04)
- Middle East
- Iran > Tehran Province
- Tehran (0.04)
- Jordan (0.04)
- Saudi Arabia > Riyadh Province
- Riyadh (0.04)
- Iran > Tehran Province
- Nepal > Bagmati Province
- Kathmandu District > Kathmandu (0.04)
- Pakistan > Islamabad Capital Territory
- Islamabad (0.04)
- Russia (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Europe
- France (0.04)
- Germany (0.04)
- Poland > Masovia Province
- Warsaw (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Spain > Galicia
- Madrid (0.04)
- United Kingdom > Scotland (0.04)
- North America
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Michigan (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- South America > Peru (0.04)
- Africa
- Genre:
- Research Report > New Finding (1.00)
- Technology: