Mitigating Hallucination in Multimodal Reasoning via Functional Attention Control

Lu, Haolang, Chu, Bolun, Fu, WeiYe, Nan, Guoshun, Liu, Junning, Pan, Minghui, Li, Qiankun, Yu, Yi, Wang, Hua, Wang, Kun

Oct-14-2025–arXiv.org Artificial Intelligence

Multimodal large reasoning models (MLRMs) are rapidly advancing vision-language reasoning and are emerging as a foundation for cross-modal intelligence. Hallucination remains a persistent failure mode, manifesting itself as erroneous reasoning chains and misinterpretation of visual content. In this study, we observe that attention heads exhibit a staged division: shallow heads predominantly serve perception, while deeper heads shift toward symbolic reasoning, revealing two major causes of hallucination, namely perceptual bias and reasoning drift. To address these issues, we propose a lightweight and interpretable two-step plugin, Functional Head Identification and Class-conditioned Rescaling, which locates perception- and reasoning-oriented heads and regulates their contributions without retraining. Evaluations on three real-world MLRMs (Kimi-VL, Ocean-R1, R1-Onevision), six benchmarks across three domains, and four baselines show that our plugin achieves an average improvement of 5% and up to 15%, with only <1% additional computation and 9% of baseline latency. Our approach is completely model-agnostic and significantly enhances both the reliability and interpretability of the off-the-shelf MLRMs, thereby enabling their safe deployment in high-stakes applications. Our code is available at https://anonymous.4open.science/r/Functional-Attention-Control.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-14-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.69)
- Asia (0.68)
- Europe (0.46)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Government
  - Voting & Elections (1.00)
  - Regional Government (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found