Goto

Collaborating Authors

 visual illusion


ColorVisualIllusions: AStatistics-based ComputationalModel

Neural Information Processing Systems

However,neitherthedata nor the tools existed in the past to extensively support these explanations. The era of big data opens a new opportunity to study input-driven approaches. We introduce atool that computes the likelihood ofpatches, given alarge dataset to learn from. Given this tool, we present a model that supports the approach and explains lightness and color visual illusions in a unified manner.


Color Visual Illusions: A Statistics-based Computational Model

Neural Information Processing Systems

The era of big data opens a new opportunity to study input-driven approaches. We introduce a tool that computes the likelihood of patches, given a large dataset to learn from. Given this tool, we present a model that supports the approach and explains lightness and color visual illusions in a unified manner.


Do Large Vision-Language Models Distinguish between the Actual and Apparent Features of Illusions?

Shinozaki, Taiga, Doi, Tomoki, Watahiki, Amane, Nishida, Satoshi, Yanaka, Hitomi

arXiv.org Artificial Intelligence

Humans are susceptible to optical illusions, which serve as valuable tools for investigating sensory and cognitive processes. Inspired by human vision studies, research has begun exploring whether machines, such as large vision language models (LVLMs), exhibit similar susceptibilities to visual illusions. However, studies often have used non-abstract images and have not distinguished actual and apparent features, leading to ambiguous assessments of machine cognition. To address these limitations, we introduce a visual question answering (VQA) dataset, categorized into genuine and fake illusions, along with corresponding control images. Genuine illusions present discrepancies between actual and apparent features, whereas fake illusions have the same actual and apparent features even though they look illusory due to the similar geometric configuration. We evaluate the performance of LVLMs for genuine and fake illusion VQA tasks and investigate whether the models discern actual and apparent features. Our findings indicate that although LVLMs may appear to recognize illusions by correctly answering questions about both feature types, they predict the same answers for both Genuine Illusion and Fake Illusion VQA questions. This suggests that their responses might be based on prior knowledge of illusions rather than genuine visual understanding. The dataset is available at https://github.com/ynklab/FILM


Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions

Rostamkhani, Mohammadmostafa, Ansari, Baktash, Sabzevari, Hoorieh, Rahmani, Farzan, Eetemadi, Sauleh

arXiv.org Artificial Intelligence

In recent years, Visual Question Answering (VQA) has made significant strides, particularly with the advent of multimodal models that integrate vision and language understanding. However, existing VQA datasets often overlook the complexities introduced by image illusions, which pose unique challenges for both human perception and model interpretation. In this study, we introduce a novel task called Illusory VQA, along with four specialized datasets: IllusionMNIST, IllusionFashionMNIST, IllusionAnimals, and IllusionChar. These datasets are designed to evaluate the performance of state-of-the-art multimodal models in recognizing and interpreting visual illusions. We assess the zero-shot performance of various models, fine-tune selected models on our datasets, and propose a simple yet effective solution for illusion detection using Gaussian and blur low-pass filters. We show that this method increases the performance of models significantly and in the case of BLIP-2 on IllusionAnimals without any fine-tuning, it outperforms humans. Our findings highlight the disparity between human and model perception of illusions and demonstrate that fine-tuning and specific preprocessing techniques can significantly enhance model robustness. This work contributes to the development of more human-like visual understanding in multimodal models and suggests future directions for adapting filters using learnable parameters.


The Illusion-Illusion: Vision Language Models See Illusions Where There are None

Ullman, Tomer

arXiv.org Artificial Intelligence

Illusions are entertaining, but they are also a useful diagnostic tool in cognitive science, philosophy, and neuroscience. A typical illusion shows a gap between how something "really is" and how something "appears to be", and this gap helps us understand the mental processing that lead to how something appears to be. Illusions are also useful for investigating artificial systems, and much research has examined whether computational models of perceptions fall prey to the same illusions as people. Here, I invert the standard use of perceptual illusions to examine basic processing errors in current vision language models. I present these models with illusory-illusions, neighbors of common illusions that should not elicit processing errors. These include such things as perfectly reasonable ducks, crooked lines that truly are crooked, circles that seem to have different sizes because they are, in fact, of different sizes, and so on. I show that many current vision language systems mistakenly see these illusion-illusions as illusions. I suggest that such failures are part of broader failures already discussed in the literature.


IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models

Shahgir, Haz Sameen, Sayeed, Khondker Salman, Bhattacharjee, Abhik, Ahmad, Wasi Uddin, Dong, Yue, Shahriyar, Rifat

arXiv.org Artificial Intelligence

The advent of Vision Language Models (VLM) has allowed researchers to investigate the visual understanding of a neural network using natural language. Beyond object classification and detection, VLMs are capable of visual comprehension and common-sense reasoning. This naturally led to the question: How do VLMs respond when the image itself is inherently unreasonable? To this end, we present IllusionVQA: a diverse dataset of challenging optical illusions and hard-to-interpret scenes to test the capability of VLMs in two distinct multiple-choice VQA tasks - comprehension and soft localization. GPT4V, the best-performing VLM, achieves 62.99% accuracy (4-shot) on the comprehension task and 49.7% on the localization task (4-shot and Chain-of-Thought). Human evaluation reveals that humans achieve 91.03% and 100% accuracy in comprehension and localization. We discover that In-Context Learning (ICL) and Chain-of-Thought reasoning substantially degrade the performance of GeminiPro on the localization task. Tangentially, we discover a potential weakness in the ICL capabilities of VLMs: they fail to locate optical illusions even when the correct answer is in the context window as a few-shot example.


HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models

Guan, Tianrui, Liu, Fuxiao, Wu, Xiyang, Xian, Ruiqi, Li, Zongxia, Liu, Xiaoyu, Wang, Xijun, Chen, Lichang, Huang, Furong, Yacoob, Yaser, Manocha, Dinesh, Zhou, Tianyi

arXiv.org Artificial Intelligence

We introduce HallusionBench, a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(Vision) and LLaVA-1.5, by emphasizing nuanced understanding and interpretation of visual data. The benchmark comprises 346 images paired with 1129 questions, all meticulously crafted by human experts. We introduce a novel structure for these visual questions designed to establish control groups. This structure enables us to conduct a quantitative analysis of the models' response tendencies, logical consistency, and various failure modes. In our evaluation on HallusionBench, we benchmarked 13 different models, highlighting a 31.42% question-pair accuracy achieved by the state-of-the-art GPT-4V. Notably, all other evaluated models achieve accuracy below 16%. Moreover, our analysis not only highlights the observed failure modes, including language hallucination and visual illusion, but also deepens an understanding of these pitfalls. Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs. Based on these insights, we suggest potential pathways for their future improvement. The benchmark and codebase can be accessed at https://github.com/tianyi-lab/HallusionBench.


InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion

Yang, Haobo, Wang, Wenyu, Cao, Ze, Duan, Zhekai, Liu, Xuchen

arXiv.org Artificial Intelligence

This paper introduces a novel approach to evaluating deep learning models' capacity for in-diagram logic interpretation. Leveraging the intriguing realm of visual illusions, we establish a unique dataset, InDL, designed to rigorously test and benchmark these models. Deep learning has witnessed remarkable progress in domains such as computer vision and natural language processing. However, models often stumble in tasks requiring logical reasoning due to their inherent 'black box' characteristics, which obscure the decision-making process. Our work presents a new lens to understand these models better by focusing on their handling of visual illusions -- a complex interplay of perception and logic. We utilize six classic geometric optical illusions to create a comparative framework between human and machine visual perception. This methodology offers a quantifiable measure to rank models, elucidating potential weaknesses and providing actionable insights for model improvements. Our experimental results affirm the efficacy of our benchmarking strategy, demonstrating its ability to effectively rank models based on their logic interpretation ability. As part of our commitment to reproducible research, the source code and datasets will be made publicly available at https://github.com/rabbit-magic-wh/InDL


Which optical illusions can animals see?

National Geographic

Visual illusions remind us that we are not passive decoders of reality but active interpreters. Our eyes capture information from the environment, but our brain can play tricks on us. Perception doesn't always match reality. Scientists have used illusions for decades to explore the psychological and cognitive processes that underlie human visual perception. More recently, evidence is emerging that suggests many animals, like us, can perceive and create a range of visual illusions.


Evolutionary Generation of Visual Motion Illusions

Sinapayen, Lana, Watanabe, Eiji

arXiv.org Artificial Intelligence

Why do we sometimes perceive static images as if they were moving? Visual motion illusions enjoy a sustained popularity, yet there is no definitive answer to the question of why they work. We present a generative model, the Evolutionary Illusion GENerator (EIGen), that creates new visual motion illusions. The structure of EIGen supports the hypothesis that illusory motion might be the result of perceiving the brain's own predictions rather than perceiving raw visual input from the eyes. The scientific motivation of this paper is to demonstrate that the perception of illusory motion could be a side effect of the predictive abilities of the brain. The philosophical motivation of this paper is to call attention to the untapped potential of "motivated failures", ways for artificial systems to fail as biological systems fail, as a worthy outlet for Artificial Intelligence and Artificial Life research.