Do Language Models Mirror Human Confidence? Exploring Psychological Insights to Address Overconfidence in LLMs
Xu, Chenjun, Wen, Bingbing, Han, Bin, Wolfe, Robert, Wang, Lucy Lu, Howe, Bill
–arXiv.org Artificial Intelligence
Psychology research has shown that humans are poor at estimating their performance on tasks, tending towards underconfidence on easy tasks and overconfidence on difficult tasks. We examine three LLMs, Llama-3-70B-instruct, Claude-3-Sonnet, and GPT-4o, on a range of QA tasks of varying difficulty, and show that models exhibit subtle differences from human patterns of overconfidence: less sensitive to task difficulty, and when prompted to answer based on different personas -- e.g., expert vs layman, or different race, gender, and ages -- the models will respond with stereotypically biased confidence estimations even though their underlying answer accuracy remains the same. Based on these observations, we propose Answer-Free Confidence Estimation (AFCE) to improve confidence calibration and LLM interpretability in these settings. AFCE is a self-assessment method that employs two stages of prompting, first eliciting only confidence scores on questions, then asking separately for the answer. Experiments on the MMLU and GPQA datasets spanning subjects and difficulty show that this separation of tasks significantly reduces overconfidence and delivers more human-like sensitivity to task difficulty.
arXiv.org Artificial Intelligence
Jul-29-2025
- Country:
- Asia
- Indonesia > Bali (0.04)
- Middle East
- Saudi Arabia > Asir Province
- Abha (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Saudi Arabia > Asir Province
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States > Florida
- Miami-Dade County > Miami (0.04)
- Canada > Ontario
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education > Curriculum
- Subject-Specific Education (1.00)
- Health & Medicine (0.67)
- Education > Curriculum
- Technology: