Can Large Language Models Express Uncertainty Like Human?

Tao, Linwei, Yeh, Yi-Fan, Kai, Bo, Dong, Minjing, Huang, Tao, Lamb, Tom A., Yu, Jialin, Torr, Philip H. S., Xu, Chang

arXiv.org Artificial Intelligence 

Large language models (LLMs) are increasingly used in high-stakes settings, where overconfident responses can mislead users. Reliable confidence estimation has been shown to enhance trust and task accuracy. Y et existing methods face practical barriers: logits are often hidden, multi-sampling is computationally expensive, and verbalized numerical uncertainty (e.g., giving a 0-100 score) deviates from natural communication. We revisit linguistic confidence (LC), where models express uncertainty through hedging language (e.g., probably, might), offering a lightweight and human-centered alternative. To advance this direction, we 1) release the first diverse, large-scale dataset of hedging expressions with human-annotated confidence scores, and 2) propose a lightweight mapper that converts hedges into confidence scores at near-zero cost. Building on these resources, we 3) conduct the first systematic study of LC across modern LLMs and QA benchmarks, revealing that while most LLMs underperform in expressing reliable LC, carefully designed prompting achieves competitive calibration and discriminability. Finally, we 4) introduce a fine-tuning framework that further improves LC reliability. Taken together, our work positions linguistic confidence as a scalable, efficient, and human-aligned approach to LLM uncertainty estimation, and calls for deeper exploration of this promising yet underexplored direction. The code and dataset are anonymously available at https://anonymous. Large language models (LLMs) are increasingly deployed in real-world applications, from education and healthcare to law and scientific discovery. While their capabilities make them powerful assistants, LLMs are also prone to hallucinations and factual errors, and human overreliance on their outputs can lead to serious consequences. For instance, a U.S. lawyer once submitted fabricated cases generated by ChatGPT, resulting in professional sanctions (ABC News, 2023). Recent social experiments demonstrate that people adjust their reliance on AI depending on how confident the model appears: reliable expressions of uncertainty can enhance trust, satisfaction, and task accuracy (Kim et al., 2024; Xu et al., 2025). These findings highlight the importance of associating reliable uncertainty estimates with LLM responses to support human decision-making. Ultimately, the conveyance of confidence plays a central role in shaping trust and guiding human-AI interaction. A growing body of work explores the extraction and representation of confidence in LLM outputs. These methods are simple and inexpensive but require access to model logits, which are typically unavailable in commercial LLM APIs. However, such scores rarely align with common user behavior or natural communication, as users do not typically phrase queries with explicit instructions like "Please output your confidence along with the answer."

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found