Attention Head Embeddings with Trainable Deep Kernels for Hallucination Detection in LLMs

Oblovatny, Rodion, Bazarova, Alexandra, Zaytsev, Alexey

Jun-12-2025–arXiv.org Artificial Intelligence

--We present a novel approach for detecting hallucinations in large language models (LLMs) by analyzing the probabilistic divergence between prompt and response hidden-state distributions. Counterintuitively, we find that hallucinated responses exhibit smaller deviations from their prompts compared to grounded responses, suggesting that hallucinations often arise from superficial rephrasing rather than substantive reasoning. T o enhance sensitivity, we employ deep learn-able kernels that automatically adapt to capture nuanced geometric differences between distributions. Our approach outperforms existing baselines, demonstrating state-of-the-art performance on several benchmarks. The method remains competitive even without kernel training, offering a robust, scalable solution for hallucination detection. In recent years, large language models (LLMs) have been widely adopted in many applications. However, they often generate hallucinations -- incorrect or fabricated content that does not match real-world facts or the provided context [1]. The latter case is of special interest, as it refers to incorrect generations in retrieval-augmented generation (RAG) settings, where LLMs rely on retrieved information to answer user queries.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Jun-12-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Russia (0.28)
- Asia (0.28)

Genre:
- Research Report
  - New Finding (0.46)
  - Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.91)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found