Source framing triggers systematic evaluation bias in Large Language Models
Germani, Federico, Spitale, Giovanni
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) are increasingly used not only to generate text but also to evaluate it, raising urgent questions about whether their judgments are consistent, unbiased, and robust to framing effects. In this study, we systematically examine inter - and intra - model agreement across four state - of - the - art LLMs - OpenAI o3 - mini, Deepseek Reasone r, xAI Grok 2, and Mistral - tasked with evaluating 4,800 narrative statements on 24 different topics of social, political, and public health relevance, for a total of 192,000 assessments. W e manipulate the disclosed source of each statement to assess how attribution to either another LLM or a human author of specified nationality affects evaluation outcomes. We find that, in the blind condition, different LLMs display a remarkably high degree of inter - and intra - model agreement across topics . However, this alignment breaks down when source framing is introduced. Here we show that attributing statements to Chinese individuals systematically lowers agreement scores across all models, and in particular for Deepseek Reasoner . Our findings reveal that framing effects can deeply affect text evaluation, with significant implications for the integrity, neutrality, and fairness of LLM - mediated information systems.
arXiv.org Artificial Intelligence
May-21-2025
- Country:
- North America > United States
- New York > New York County
- New York City (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Connecticut > New Haven County
- New Haven (0.04)
- New York > New York County
- Europe
- France (0.10)
- Ukraine (0.04)
- Germany (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Norway > Eastern Norway
- Oslo (0.04)
- Asia
- China (1.00)
- Russia (0.27)
- Taiwan (0.14)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East > Palestine
- Gaza Strip > Gaza Governorate > Gaza (0.04)
- North America > United States
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (0.93)
- Research Report
- Industry:
- Law (1.00)
- Media > News (0.67)
- Health & Medicine
- Therapeutic Area (0.70)
- Public Health (0.66)
- Government > Regional Government
- Asia Government > China Government (1.00)
- Technology: