In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

Chen, Shiqi, Xiong, Miao, Liu, Junteng, Wu, Zhengxuan, Xiao, Teng, Gao, Siyang, He, Junxian

arXiv.org Artificial Intelligence 

Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited. In this study, we delve into the underlying mechanisms of LLM hallucinations from the perspective of inner representations, and discover a salient pattern associated with hallucinations: correct generations tend to have sharper context activations in the hidden states of the in-context tokens, compared to the incorrect ones. Leveraging this insight, we propose an entropy-based metric to quantify the "sharpness" among the in-context hidden states and incorporate it into the decoding process to formulate a constrained decoding approach. Experiments on various knowledge-seeking and hallucination benchmarks demonstrate our approach's consistent effectiveness, for example, achieving up to an 8.6 point improvement on TruthfulQA. We believe this study can improve our understanding of hallucinations and serve as a practical solution for hallucination mitigation. Large language models (LLMs) have made remarkable advancements in recent years, with extensive applications across various domains (OpenAI, 2022; 2023; Kaddour et al., 2023). Despite these advances, LLMs still face notable challenges regarding factuality, which could critically undermine the trustworthiness and reliability of LLMs, as highlighted in recent studies (Chen et al., 2023; Ji et al., 2023; Wang et al., 2023). To address the factuality issue, many efforts have focused on retrieving external knowledge (Ram et al., 2023b; Yu et al., 2023; Jiang et al., 2023) for generation or fact-checking, as well as fine-tuning (Asai et al., 2023) and self-evaluation (Pan et al., 2023; Xiong et al., 2024). However, these methods often require high computational resources or high-quality knowledge bases, which may not be available for domain-specific cases. In contrast, we aim to tackle this challenge from the perspective of model's inner representations, investigating whether the hidden states contain information about hallucination.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found