MRFD: Multi-Region Fusion Decoding with Self-Consistency for Mitigating Hallucinations in LVLMs

Ge, Haonan, Wang, Yiwei, Yang, Ming-Hsuan, Cai, Yujun

Oct-14-2025–arXiv.org Artificial Intelligence

Large Vision-Language Models (LVLMs) have shown strong performance across multimodal tasks. However, they often produce hallucinations -- text that is inconsistent with visual input, due to the limited ability to verify information in different regions of the image. To address this, we propose Multi-Region Fusion Decoding (MRFD), a training-free decoding method that improves factual grounding by modeling inter-region consistency. MRFD identifies salient regions using cross-attention, generates initial responses for each, and computes reliability weights based on Jensen-Shannon Divergence (JSD) among the responses. These weights guide a consistency-aware fusion of per-region predictions, using region-aware prompts inspired by Chain-of-Thought reasoning. Experiments across multiple LVLMs and benchmarks show that MRFD significantly reduces hallucinations and improves response factuality without requiring model updates.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > Israel
    - Tel Aviv District > Tel Aviv (0.04)
  - Singapore (0.04)
- Europe
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - Switzerland (0.04)
- North America
  - Mexico > Mexico City
    - Mexico City (0.04)
  - United States > California
    - Los Angeles County > Long Beach (0.04)
- Oceania > Australia
  - Queensland (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language
    - Chatbot (0.46)
    - Large Language Model (0.46)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found