Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations

Liu, Xiangrui, Luo, Man, Chatterjee, Agneet, Wei, Hua, Baral, Chitta, Yang, Yezhou

arXiv.org Artificial Intelligence 

Hallucination is a long-standing problem that has been actively investigated in Vision-Language Models (VLMs). Existing research commonly attributes hallucinations to technical limitations or sycophancy bias, where the latter means the models tend to generate incorrect answers to align with user expectations. However, these explanations primarily focus on technical or externally driven factors, and may have neglected the possibility that hallucination behaviours might mirror cognitive biases observed in human psychology. In this work, we introduce a psychological taxonomy, categorizing VLMs' cognitive biases that lead to hallucinations, including sycophancy, logical inconsistency, and a newly identified VLMs behaviour: appeal to authority. To systematically analyze these behaviours, we design AIpsych, a scalable benchmark that reveals psychological tendencies in model response patterns. Leveraging this benchmark, we investigate how variations in model architecture and parameter size influence model behaviour when responding to strategically manipulated questions. Our experiments reveal that as model size increases, VLMs exhibit stronger sycophantic tendencies but reduced authority bias, suggesting increasing competence but a potential erosion of response integrity. A human subject study further validates our hypotheses and highlights key behavioural differences between VLMs and human respondents. This work suggests a new perspective for understanding hallucination in VLMs and highlights the importance of integrating psychological principles into model evaluation. The benchmark and codes are tested and available in the anonymous link https://anonymous.4open.science/r/AIpsych-666.Figure 1: Left: a VLM exhibits sycophancy by favouring the questioner's options despite recognising it is a pink cup. Right: a human demonstrates authority bias by accepting the question's framing, also yielding the wrong answer. However, to distinguish between them, we will need to ask more questions. VLMs have made remarkable progress, achieving increasingly higher accuracy in visual reasoning tasks and enhancing real-world applications such as image captioning, visual question answering, and multimodal retrieval (Chen et al., 2023).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found