Measuring Visual Sycophancy in Multimodal Models

Aug-17-2024–arXiv.org Artificial Intelligence

This paper introduces and examines the phenomenon of "visual sycophancy" in multimodal language models, a term we propose to describe these models' tendency to disproportionately favor visually presented information, even when it contradicts their prior knowledge or responses. Our study employs a systematic methodology to investigate this phenomenon: we present models with images of multiple-choice questions, which they initially answer correctly, then expose the same model to versions with visually pre-marked options. Our findings reveal a significant shift in the models' responses towards the pre-marked option despite their previous correct answers. Comprehensive evaluations demonstrate that visual sycophancy is a consistent and quantifiable behavior across various model architectures. Our findings highlight potential limitations in the reliability of these models when processing potentially misleading visual information, raising important questions about their application in critical decision-making contexts.

arxiv preprint arxiv, probability, visual sycophancy, (13 more...)

arXiv.org Artificial Intelligence

Aug-17-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania (0.04)
- Africa > Central African Republic
  - Ombella-M'Poko > Bimbo (0.04)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.72)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found