Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Wüst, Antonia, Tobiasch, Tim, Helff, Lukas, Dhami, Devendra S., Rothkopf, Constantin A., Kersting, Kristian
–arXiv.org Artificial Intelligence
Visual reasoning, the ability to understand, interpret, and reason about the visual world, is a fundamental aspect of human intelligence [27]. It allows us to navigate our environment, interact with objects, and make sense of complex visual scenes. In recent years, the field of artificial intelligence (AI) has advanced rapidly toward replicating aspects of this visual reasoning, with significant focus placed on Vision-Language Models (VLMs) [5, 24, 25]. These models integrate visual and textual information to generate descriptive content, aiming to mimic how humans comprehend and reason about the world. Because of their human-like responses, VLMs often create the illusion of possessing human-like perception and intelligence. However, as recent work shows, VLMs and the Large Language Models (LLM) on which they are based have dramatic shortcomings in the case of reasoning [30] and visual perception [12, 13, 19, 34] or their combination [39, 47, 48]. Bongard problems (BPs), a class of visual puzzles that require the identification of underlying rules based on a limited set of images, provide a unique and challenging benchmark for assessing visual reasoning abilities in AI systems [4]. Conceived by Russian scientist Mikhail Bongard in 1967, these visual puzzles test cognitive abilities in pattern recognition and abstract reasoning, posing a formidable challenge even to advanced AI systems [15].
arXiv.org Artificial Intelligence
Oct-25-2024