CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base

Nguyen, Cong-Duy, Wu, Xiaobao, Vu, Duc Anh, Zhao, Shuai, Nguyen, Thong, Luu, Anh Tuan

Feb-18-2025–arXiv.org Artificial Intelligence

Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal reasoning capabilities, but they remain susceptible to hallucination, particularly object hallucination where non-existent objects or incorrect attributes are fabricated in generated descriptions. Existing detection methods achieve strong performance but rely heavily on expensive API calls and iterative LVLM-based validation, making them impractical for large-scale or offline use. To address these limitations, we propose CutPaste\&Find, a lightweight and training-free framework for detecting hallucinations in LVLM-generated outputs. Our approach leverages off-the-shelf visual and linguistic modules to perform multi-step verification efficiently without requiring LVLM inference. At the core of our framework is a Visual-aid Knowledge Base that encodes rich entity-attribute relationships and associated image representations. We introduce a scaling factor to refine similarity scores, mitigating the issue of suboptimal alignment values even for ground-truth image-text pairs. Comprehensive evaluations on benchmark datasets, including POPE and R-Bench, demonstrate that CutPaste\&Find achieves competitive hallucination detection performance while being significantly more efficient and cost-effective than previous methods.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

Feb-18-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.14)
- Asia > Singapore (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.96)
  - Representation & Reasoning > Expert Systems (0.63)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found