A Unified Representation Underlying the Judgment of Large Language Models
Lu, Yi-Long, Song, Jiajun, Wang, Wei
–arXiv.org Artificial Intelligence
A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery of decodable neural representations for distinct concepts in Large Language Models (LLMs) has suggested a modular architecture, whether these representations are truly independent systems remains an open question. Here we provide evidence for a convergent architecture for evaluative judgment. Across a range of LLMs, we find that diverse evaluative judgments are computed along a dominant dimension, which we term the Valence-Assent Axis (VAA). This axis jointly encodes subjective valence ("what is good") and the model's assent to factual claims ("what is true"). Through direct interventions, we demonstrate this axis drives a critical mechanism, which is identified as the subordination of reasoning: the VAA functions as a control signal that steers the generative process to construct a rationale consistent with its evaluative state, even at the cost of factual accuracy. Our discovery offers a mechanistic account for response bias and hallucination, revealing how an architecture that promotes coherent judgment can systematically undermine faithful reasoning.
arXiv.org Artificial Intelligence
Nov-5-2025
- Country:
- Africa > Nigeria (0.04)
- Asia
- China > Beijing
- Beijing (0.04)
- East Asia (0.04)
- South Korea (0.04)
- China > Beijing
- Europe > United Kingdom
- England (0.04)
- North America
- Canada (0.04)
- Mexico (0.04)
- United States > Florida
- Miami-Dade County > Miami (0.04)
- Pacific Ocean > North Pacific Ocean
- San Francisco Bay > Golden Gate (0.04)
- Genre:
- Questionnaire & Opinion Survey (1.00)
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Industry:
- Technology: