AITopics | visual illusion

Collaborating Authors

visual illusion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

3D Visual Illusion Depth Estimation

Neural Information Processing SystemsJun-12-2026, 18:01:33 GMT

In this paper, we reveal that the machine visual system is also seriously fooled by 3D visual illusions, including monocular and binocular depth estimation. In order to explore and analyze the impact of 3D visual illusion on depth estimation, we collect a large dataset containing almost 3k scenes and 200k images to train and evaluate SOTA monocular and binocular depth estimation methods. We also propose a 3D visual illusion depth estimation framework that uses common sense from the vision language model to adaptively fuse depth from binocular disparity and monocular depth. Experiments show that SOTA monocular, binocular, and multi-view depth estimation approaches are all fooled by various 3D visual illusions, while our method achieves SOTA performance.

artificial intelligence, proceedings, visual illusion, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

ColorVisualIllusions: AStatistics-based ComputationalModel

Neural Information Processing SystemsFeb-8-2026, 18:51:40 GMT

However,neitherthedata nor the tools existed in the past to extensively support these explanations. The era of big data opens a new opportunity to study input-driven approaches. We introduce atool that computes the likelihood ofpatches, given alarge dataset to learn from. Given this tool, we present a model that supports the approach and explains lightness and color visual illusions in a unified manner.

artificial intelligence, illusion, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Color Visual Illusions: A Statistics-based Computational Model

Neural Information Processing SystemsOct-3-2025, 04:04:15 GMT

The era of big data opens a new opportunity to study input-driven approaches. We introduce a tool that computes the likelihood of patches, given a large dataset to learn from. Given this tool, we present a model that supports the approach and explains lightness and color visual illusions in a unified manner.

illusion, likelihood, visual illusion, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
North America > Canada (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Do Large Vision-Language Models Distinguish between the Actual and Apparent Features of Illusions?

Shinozaki, Taiga, Doi, Tomoki, Watahiki, Amane, Nishida, Satoshi, Yanaka, Hitomi

arXiv.org Artificial IntelligenceJun-12-2025

Humans are susceptible to optical illusions, which serve as valuable tools for investigating sensory and cognitive processes. Inspired by human vision studies, research has begun exploring whether machines, such as large vision language models (LVLMs), exhibit similar susceptibilities to visual illusions. However, studies often have used non-abstract images and have not distinguished actual and apparent features, leading to ambiguous assessments of machine cognition. To address these limitations, we introduce a visual question answering (VQA) dataset, categorized into genuine and fake illusions, along with corresponding control images. Genuine illusions present discrepancies between actual and apparent features, whereas fake illusions have the same actual and apparent features even though they look illusory due to the similar geometric configuration. We evaluate the performance of LVLMs for genuine and fake illusion VQA tasks and investigate whether the models discern actual and apparent features. Our findings indicate that although LVLMs may appear to recognize illusions by correctly answering questions about both feature types, they predict the same answers for both Genuine Illusion and Fake Illusion VQA questions. This suggests that their responses might be based on prior knowledge of illusions rather than genuine visual understanding. The dataset is available at https://github.com/ynklab/FILM

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.05765

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions

Rostamkhani, Mohammadmostafa, Ansari, Baktash, Sabzevari, Hoorieh, Rahmani, Farzan, Eetemadi, Sauleh

arXiv.org Artificial IntelligenceDec-11-2024

In recent years, Visual Question Answering (VQA) has made significant strides, particularly with the advent of multimodal models that integrate vision and language understanding. However, existing VQA datasets often overlook the complexities introduced by image illusions, which pose unique challenges for both human perception and model interpretation. In this study, we introduce a novel task called Illusory VQA, along with four specialized datasets: IllusionMNIST, IllusionFashionMNIST, IllusionAnimals, and IllusionChar. These datasets are designed to evaluate the performance of state-of-the-art multimodal models in recognizing and interpreting visual illusions. We assess the zero-shot performance of various models, fine-tune selected models on our datasets, and propose a simple yet effective solution for illusion detection using Gaussian and blur low-pass filters. We show that this method increases the performance of models significantly and in the case of BLIP-2 on IllusionAnimals without any fine-tuning, it outperforms humans. Our findings highlight the disparity between human and model perception of illusions and demonstrate that fine-tuning and specific preprocessing techniques can significantly enhance model robustness. This work contributes to the development of more human-like visual understanding in multimodal models and suggests future directions for adapting filters using learnable parameters.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.08169

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Illusion-Illusion: Vision Language Models See Illusions Where There are None

Ullman, Tomer

arXiv.org Artificial IntelligenceDec-6-2024

Illusions are entertaining, but they are also a useful diagnostic tool in cognitive science, philosophy, and neuroscience. A typical illusion shows a gap between how something "really is" and how something "appears to be", and this gap helps us understand the mental processing that lead to how something appears to be. Illusions are also useful for investigating artificial systems, and much research has examined whether computational models of perceptions fall prey to the same illusions as people. Here, I invert the standard use of perceptual illusions to examine basic processing errors in current vision language models. I present these models with illusory-illusions, neighbors of common illusions that should not elicit processing errors. These include such things as perfectly reasonable ducks, crooked lines that truly are crooked, circles that seem to have different sizes because they are, in fact, of different sizes, and so on. I show that many current vision language systems mistakenly see these illusion-illusions as illusions. I suggest that such failures are part of broader failures already discussed in the literature.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.18613

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Germany > Saxony > Leipzig (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Add feedback

IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models

Shahgir, Haz Sameen, Sayeed, Khondker Salman, Bhattacharjee, Abhik, Ahmad, Wasi Uddin, Dong, Yue, Shahriyar, Rifat

arXiv.org Artificial IntelligenceMar-30-2024

The advent of Vision Language Models (VLM) has allowed researchers to investigate the visual understanding of a neural network using natural language. Beyond object classification and detection, VLMs are capable of visual comprehension and common-sense reasoning. This naturally led to the question: How do VLMs respond when the image itself is inherently unreasonable? To this end, we present IllusionVQA: a diverse dataset of challenging optical illusions and hard-to-interpret scenes to test the capability of VLMs in two distinct multiple-choice VQA tasks - comprehension and soft localization. GPT4V, the best-performing VLM, achieves 62.99% accuracy (4-shot) on the comprehension task and 49.7% on the localization task (4-shot and Chain-of-Thought). Human evaluation reveals that humans achieve 91.03% and 100% accuracy in comprehension and localization. We discover that In-Context Learning (ICL) and Chain-of-Thought reasoning substantially degrade the performance of GeminiPro on the localization task. Tangentially, we discover a potential weakness in the ICL capabilities of VLMs: they fail to locate optical illusions even when the correct answer is in the context window as a few-shot example.

illusion, optical illusion, vlm, (15 more...)

arXiv.org Artificial Intelligence

2403.15952

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > California > Riverside County > Riverside (0.04)
Asia > Japan > Shikoku > Kagawa Prefecture > Takamatsu (0.04)
Asia > Bangladesh (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models

Guan, Tianrui, Liu, Fuxiao, Wu, Xiyang, Xian, Ruiqi, Li, Zongxia, Liu, Xiaoyu, Wang, Xijun, Chen, Lichang, Huang, Furong, Yacoob, Yaser, Manocha, Dinesh, Zhou, Tianyi

arXiv.org Artificial IntelligenceNov-28-2023

We introduce HallusionBench, a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(Vision) and LLaVA-1.5, by emphasizing nuanced understanding and interpretation of visual data. The benchmark comprises 346 images paired with 1129 questions, all meticulously crafted by human experts. We introduce a novel structure for these visual questions designed to establish control groups. This structure enables us to conduct a quantitative analysis of the models' response tendencies, logical consistency, and various failure modes. In our evaluation on HallusionBench, we benchmarked 13 different models, highlighting a 31.42% question-pair accuracy achieved by the state-of-the-art GPT-4V. Notably, all other evaluated models achieve accuracy below 16%. Moreover, our analysis not only highlights the observed failure modes, including language hallucination and visual illusion, but also deepens an understanding of these pitfalls. Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs. Based on these insights, we suggest potential pathways for their future improvement. The benchmark and codebase can be accessed at https://github.com/tianyi-lab/HallusionBench.

gpt-4v, illusion, llava-1, (14 more...)

arXiv.org Artificial Intelligence

2310.14566

Country:

Europe > Russia (0.15)
Asia > Russia (0.15)
North America > United States > Arizona (0.05)
(25 more...)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.92)
Leisure & Entertainment > Sports (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual Illusion

Yang, Haobo, Wang, Wenyu, Cao, Ze, Duan, Zhekai, Liu, Xuchen

arXiv.org Artificial IntelligenceJun-5-2023

This paper introduces a novel approach to evaluating deep learning models' capacity for in-diagram logic interpretation. Leveraging the intriguing realm of visual illusions, we establish a unique dataset, InDL, designed to rigorously test and benchmark these models. Deep learning has witnessed remarkable progress in domains such as computer vision and natural language processing. However, models often stumble in tasks requiring logical reasoning due to their inherent 'black box' characteristics, which obscure the decision-making process. Our work presents a new lens to understand these models better by focusing on their handling of visual illusions -- a complex interplay of perception and logic. We utilize six classic geometric optical illusions to create a comparative framework between human and machine visual perception. This methodology offers a quantifiable measure to rank models, elucidating potential weaknesses and providing actionable insights for model improvements. Our experimental results affirm the efficacy of our benchmarking strategy, demonstrating its ability to effectively rank models based on their logic interpretation ability. As part of our commitment to reproducible research, the source code and datasets will be made publicly available at https://github.com/rabbit-magic-wh/InDL

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.17716

Country:

North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Kansas > Sheridan County (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Which optical illusions can animals see?

National GeographicDec-27-2021, 10:45:16 GMT

Visual illusions remind us that we are not passive decoders of reality but active interpreters. Our eyes capture information from the environment, but our brain can play tricks on us. Perception doesn't always match reality. Scientists have used illusions for decades to explore the psychological and cognitive processes that underlie human visual perception. More recently, evidence is emerging that suggests many animals, like us, can perceive and create a range of visual illusions.

illusion, optical illusion, perception, (15 more...)

National Geographic

Country:

Oceania > Australia (0.05)
North America > United States > Florida > Orange County (0.05)
Europe > Italy (0.05)

Industry: Leisure & Entertainment (0.48)

Technology:

Information Technology > Communications > Social Media (0.40)
Information Technology > Artificial Intelligence (0.35)

Add feedback