Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations
Li, Shuo, Sun, Jiajun, Zheng, Guodong, Fan, Xiaoran, Shen, Yujiong, Lu, Yi, Xi, Zhiheng, Yang, Yuming, Tan, Wenming, Ji, Tao, Gui, Tao, Zhang, Qi, Huang, Xuanjing
–arXiv.org Artificial Intelligence
Recently, multimodal large language models (MLLMs) have demonstrated remarkable performance in visual-language tasks. However, the authenticity of the responses generated by MLLMs is often compromised by object hallucinations. We identify that a key cause of these hallucinations is the model's over-susceptibility to specific image frequency features in detecting objects. In this paper, we introduce Multi-Frequency Perturbations (MFP), a simple, cost-effective, and pluggable method that leverages both low-frequency and high-frequency features of images to perturb visual feature representations and explicitly suppress redundant frequency-domain features during inference, thereby mitigating hallucinations. Experimental results demonstrate that our method significantly mitigates object hallucinations across various model architectures. Furthermore, as a training-time method, MFP can be combined with inference-time methods to achieve state-of-the-art performance on the CHAIR benchmark.
arXiv.org Artificial Intelligence
Mar-19-2025
- Country:
- Europe > Switzerland
- North America > United States
- Hawaii (0.14)
- Genre:
- Research Report > New Finding (0.66)
- Technology: