Simulated Self-Assessment in Large Language Models: A Psychometric Approach to AI Self-Efficacy
Jackson, Daniel I, Jensen, Emma L, Hussain, Syed-Amad, Sezgin, Emre
–arXiv.org Artificial Intelligence
Self-assessment is a key aspect of reliable intelligence, yet evaluations of large language models (LLMs) focus mainly on task accuracy. We adapted the 10-item General Self-Efficacy Scale (GSES) to elicit simulated self-assessments from ten LLMs across four conditions: no task, computational reasoning, social reasoning, and summarization. GSES responses were highly stable across repeated administrations and randomized item orders. However, models showed significantly different self-efficacy levels across conditions, with aggregate scores lower than human norms. All models achieved perfect accuracy on computational and social questions, whereas summarization performance varied widely. Self-assessment did not reliably reflect ability: several low-scoring models performed accurately, while some high-scoring models produced weaker summaries. Follow-up confidence prompts yielded modest, mostly downward revisions, suggesting mild overestimation in first-pass assessments. Qualitative analysis showed that higher self-efficacy corresponded to more assertive, anthropomorphic reasoning styles, whereas lower scores reflected cautious, de-anthropomorphized explanations. Psychometric prompting provides structured insight into LLM communication behavior but not calibrated performance estimates.
arXiv.org Artificial Intelligence
Nov-27-2025
- Country:
- Asia
- Japan (0.04)
- Middle East > Israel
- Jerusalem District > Jerusalem (0.04)
- Europe
- Switzerland (0.04)
- United Kingdom > England
- Berkshire > Windsor (0.04)
- Cambridgeshire > Cambridge (0.04)
- North America
- Costa Rica (0.04)
- United States
- New Jersey (0.04)
- Ohio > Franklin County
- Columbus (0.04)
- Asia
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Health & Medicine > Therapeutic Area (0.46)
- Technology: