IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models
Shahgir, Haz Sameen, Sayeed, Khondker Salman, Bhattacharjee, Abhik, Ahmad, Wasi Uddin, Dong, Yue, Shahriyar, Rifat
–arXiv.org Artificial Intelligence
The advent of Vision Language Models (VLM) has allowed researchers to investigate the visual understanding of a neural network using natural language. Beyond object classification and detection, VLMs are capable of visual comprehension and common-sense reasoning. This naturally led to the question: How do VLMs respond when the image itself is inherently unreasonable? To this end, we present IllusionVQA: a diverse dataset of challenging optical illusions and hard-to-interpret scenes to test the capability of VLMs in two distinct multiple-choice VQA tasks - comprehension and soft localization. GPT4V, the best-performing VLM, achieves 62.99% accuracy (4-shot) on the comprehension task and 49.7% on the localization task (4-shot and Chain-of-Thought). Human evaluation reveals that humans achieve 91.03% and 100% accuracy in comprehension and localization. We discover that In-Context Learning (ICL) and Chain-of-Thought reasoning substantially degrade the performance of GeminiPro on the localization task. Tangentially, we discover a potential weakness in the ICL capabilities of VLMs: they fail to locate optical illusions even when the correct answer is in the context window as a few-shot example.
arXiv.org Artificial Intelligence
Mar-30-2024
- Country:
- Asia
- Bangladesh (0.04)
- Japan > Shikoku
- Kagawa Prefecture > Takamatsu (0.04)
- North America
- Canada > Alberta
- Census Division No. 5
- Kneehill County (0.04)
- Starland County (0.04)
- Census Division No. 7 > Stettler County No. 6 (0.04)
- Census Division No. 8 > Red Deer County (0.04)
- Census Division No. 5
- United States > California
- Los Angeles County > Los Angeles (0.14)
- Riverside County > Riverside (0.04)
- Canada > Alberta
- Asia
- Genre:
- Research Report (0.50)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.47)
- Technology: