Cost-Effective Hallucination Detection for LLMs

Valentin, Simon, Fu, Jinmiao, Detommaso, Gianluca, Xu, Shaoyuan, Zappella, Giovanni, Wang, Bryan

arXiv.org Machine Learning 

Despite their impressive capabilities, large language models (LLMs) can be prone to generating hallucinations -- undesirable outputs that are incorrect, unfaithful, or inconsistent with respect to the inputs (or the output itself) [1]. These unreliable behaviors pose significant risks for adopting LLMs in real-world applications. Challenges in detecting hallucinations lie, among other things, in hallucinations taking different forms, being context-dependent and sometimes being in conflict with other desirable properties of generated text [2, 3]. Hallucinations may be harmless in some contexts, but can be undesired or potentially dangerous in other applications (e.g., erroneous medical advice). Detecting and quantifying hallucination risk is thus a critical capability to enable safe applications of LLMs and improve generated outputs. Prior work has proposed various approaches for detecting and mitigating hallucinations in LLM-generated outputs, including verifying faithfulness to inputs [4], assessing internal coherence [5], consulting external knowledge sources [6], and quantifying model uncertainty [2, 3, 7, 8]. However, deploying these methods in production settings is far from trivial due to several challenges: First, there is limited comparative evaluation illuminating how different detection methods perform. Second, existing approaches for detecting hallucinations differ greatly in their computational demands, and guidelines are lacking on cost-effectiveness trade-offs to inform method selection for real-world applications with constraints. Third, hallucination detection in the real world often requires careful consideration of risks and false positive/negative trade-offs, requiring methods to provide well-calibrated probability scores.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found