Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding
Mao, Shunqi, Zhang, Chaoyi, Cai, Weidong
–arXiv.org Artificial Intelligence
Existing vision-language models (VLMs) often suffer from visual hallucination, where the generated responses contain inaccuracies that are not grounded in the visual input. Efforts to address this issue without model finetuning primarily mitigate hallucination by reducing biases contrastively or amplifying the weights of visual embedding during decoding. However, these approaches improve visual perception at the cost of impairing the language reasoning capability. In this work, we propose the Perception Magnifier (PM), a novel visual decoding method that iteratively isolates relevant visual tokens based on attention and magnifies the corresponding regions, spurring the model to concentrate on fine-grained visual details during decoding. Specifically, by magnifying critical regions while preserving the structural and contextual information at each decoding step, PM allows the VLM to enhance its scrutiny of the visual input, hence producing more accurate and faithful responses. Extensive experimental results demonstrate that PM not only achieves superior hallucination mitigation but also enhances language generation while preserving strong reasoning capabilities. Code is available at https://github.com/ShunqiM/PM .
arXiv.org Artificial Intelligence
Mar-13-2025
- Genre:
- Research Report > New Finding (0.48)
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science > Problem Solving (0.68)
- Machine Learning > Neural Networks
- Deep Learning (0.93)
- Natural Language
- Chatbot (1.00)
- Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence