Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?

Wüst, Antonia, Tobiasch, Tim, Helff, Lukas, Dhami, Devendra S., Rothkopf, Constantin A., Kersting, Kristian

Oct-25-2024–arXiv.org Artificial Intelligence

Visual reasoning, the ability to understand, interpret, and reason about the visual world, is a fundamental aspect of human intelligence [27]. It allows us to navigate our environment, interact with objects, and make sense of complex visual scenes. In recent years, the field of artificial intelligence (AI) has advanced rapidly toward replicating aspects of this visual reasoning, with significant focus placed on Vision-Language Models (VLMs) [5, 24, 25]. These models integrate visual and textual information to generate descriptive content, aiming to mimic how humans comprehend and reason about the world. Because of their human-like responses, VLMs often create the illusion of possessing human-like perception and intelligence. However, as recent work shows, VLMs and the Large Language Models (LLM) on which they are based have dramatic shortcomings in the case of reasoning [30] and visual perception [12, 13, 19, 34] or their combination [39, 47, 48]. Bongard problems (BPs), a class of visual puzzles that require the identification of underlying rules based on a limited set of images, provide a unique and challenging benchmark for assessing visual reasoning abilities in AI systems [4]. Conceived by Russian scientist Mikhail Bongard in 1967, these visual puzzles test cognitive abilities in pattern recognition and abstract reasoning, posing a formidable challenge even to advanced AI systems [15].

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Oct-25-2024

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Education (0.46)
- Energy > Oil & Gas (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)