Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

Li, Shuo, Sun, Jiajun, Zheng, Guodong, Fan, Xiaoran, Shen, Yujiong, Lu, Yi, Xi, Zhiheng, Yang, Yuming, Tan, Wenming, Ji, Tao, Gui, Tao, Zhang, Qi, Huang, Xuanjing

Mar-19-2025–arXiv.org Artificial Intelligence

Recently, multimodal large language models (MLLMs) have demonstrated remarkable performance in visual-language tasks. However, the authenticity of the responses generated by MLLMs is often compromised by object hallucinations. We identify that a key cause of these hallucinations is the model's over-susceptibility to specific image frequency features in detecting objects. In this paper, we introduce Multi-Frequency Perturbations (MFP), a simple, cost-effective, and pluggable method that leverages both low-frequency and high-frequency features of images to perturb visual feature representations and explicitly suppress redundant frequency-domain features during inference, thereby mitigating hallucinations. Experimental results demonstrate that our method significantly mitigates object hallucinations across various model architectures. Furthermore, as a training-time method, MFP can be combined with inference-time methods to achieve state-of-the-art performance on the CHAIR benchmark.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Mar-19-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland
  - Zürich > Zürich (0.14)
- North America > United States
  - Hawaii (0.14)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)
    - Natural Language
      - Chatbot (0.94)
      - Large Language Model (1.00)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)