Detecting Token-Level Hallucinations Using Variance Signals: A Reference-Free Approach

Kumar, Keshav

arXiv.org Artificial Intelligence 

- Large Language Models (LLMs) demonstrate impressive generative abilities across a wide range of tasks but continue to suffer from hallucinations --outputs that are fluent yet factually incorrect. This paper introduces a reference-free, token-level hallucination detection framework that identifies unreliable tokens by analyzing variance in log-probabilities across multiple stochastic generations. Unlike traditional methods that depend on external references or sentence-level verification, our approach is model-agnostic, interpretable, and computationally efficient, making it suitable for both real-time and post-hoc analysis. We evaluate the proposed method on three diverse datasets--SQuAD v2 (unanswerable questions), XSum (abstractive summarization), and TriviaQA (open-domain question answering)--using autoregressive models of increasing scale: GPT-Neo 125M, Falcon 1B, and Mistral 7B. Results show that token-level variance strongly correlates with hallucination behavior, revealing clear distinctions in uncertainty across model sizes. The framework maintains accuracy even under limited sampling conditions and introduces minimal computational overhead, supporting its practicality for lightweight deployment. Overall, this work provides a scalable, reproducible, and fine-grained diagnostic tool for detecting hallucinations in LLMs, with potential extensions to multilingual and real-time generation settings. Large language models (LLMs) have transformed natural language processing, powering tasks such as summarization, dialogue generation, and open-ended question answering.