TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

Zheng, Danna, Liu, Danyang, Lapata, Mirella, Pan, Jeff Z.

May-6-2024–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs' outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLM's response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with factchecking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics. Large-scale language models (LLMs) have recently been in the spotlight due to their impressive performance in various NLP tasks, sparking enthusiasm for potential applications (Kaddour et al., 2023; Bubeck et al., 2023). However, a notable concern has emerged regarding the ability of LLMs to generate plausible yet incorrect responses (Tam et al., 2022; Liu et al., 2023; Devaraj et al., 2022), particularly challenging for users without specialized expertise. Consequently, users are often advised to employ LLMs in scenarios where they can confidently assess the information provided.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

May-6-2024

arXiv.org PDF

Add feedback

Country:
- Europe (0.94)
- North America > United States (1.00)

Genre:
- Personal > Honors (0.30)
- Research Report > New Finding (0.48)

Industry:
- Government > Regional Government > North America Government > United States Government (0.47)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found