progressive matrix
A Study of Rule Omission in Raven's Progressive Matrices
Analogical reasoning lies at the core of human cognition and remains a fundamental challenge for artificial intelligence. Raven's Progressive Matrices (RPM) serve as a widely used benchmark to assess abstract reasoning by requiring the inference of underlying structural rules. While many vision-based and language-based models have achieved success on RPM tasks, it remains unclear whether their performance reflects genuine reasoning ability or reliance on statistical shortcuts. This study investigates the generalization capacity of modern AI systems under conditions of incomplete training by deliberately omitting several structural rules during training. Both sequence-to-sequence transformer models and vision-based architectures such as CoPINet and the Dual-Contrast Network are evaluated on the Impartial-RAVEN (I-RAVEN) dataset. Experiments reveal that although transformers demonstrate strong performance on familiar rules, their accuracy declines sharply when faced with novel or omitted rules. Moreover, the gap between token-level accuracy and complete answer accuracy highlights fundamental limitations in current approaches. These findings provide new insights into the reasoning mechanisms underlying deep learning models and underscore the need for architectures that move beyond pattern recognition toward robust abstract reasoning.
Towards Learning to Reason: Comparing LLMs with Neuro-Symbolic on Arithmetic Relations in Abstract Reasoning
Hersche, Michael, Camposampiero, Giacomo, Wattenhofer, Roger, Sebastian, Abu, Rahimi, Abbas
This work compares large language models (LLMs) and neuro-symbolic approaches in solving Raven's progressive matrices (RPM), a visual abstract reasoning test that involves the understanding of mathematical rules such as progression or arithmetic addition. Providing the visual attributes directly as textual prompts, which assumes an oracle visual perception module, allows us to measure the model's abstract reasoning capability in isolation. Despite providing such compositionally structured representations from the oracle visual perception and advanced prompting techniques, both GPT-4 and Llama-3 70B cannot achieve perfect accuracy on the center constellation of the I-RAVEN dataset. Our analysis reveals that the root cause lies in the LLM's weakness in understanding and executing arithmetic rules. As a potential remedy, we analyze the Abductive Rule Learner with Context-awareness (ARLC), a neuro-symbolic approach that learns to reason with vector-symbolic architectures (VSAs). Here, concepts are represented with distributed vectors s.t. dot products between encoded vectors define a similarity kernel, and simple element-wise operations on the vectors perform addition/subtraction on the encoded values. We find that ARLC achieves almost perfect accuracy on the center constellation of I-RAVEN, demonstrating a high fidelity in arithmetic rules. To stress the length generalization capabilities of the models, we extend the RPM tests to larger matrices (3x10 instead of typical 3x3) and larger dynamic ranges of the attribute values (from 10 up to 1000). We find that the LLM's accuracy of solving arithmetic rules drops to sub-10%, especially as the dynamic range expands, while ARLC can maintain a high accuracy due to emulating symbolic computations on top of properly distributed representations. Our code is available at https://github.com/IBM/raven-large-language-models.
Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules in Vector-symbolic Architectures
Hersche, Michael, di Stefano, Francesco, Hofmann, Thomas, Sebastian, Abu, Rahimi, Abbas
reasoning is a cornerstone of human intelligence, and replicating it with artificial intelligence (AI) presents an ongoing challenge. This study focuses on efficiently solving Raven's progressive matrices (RPM), a visual test for assessing abstract reasoning abilities, by using distributed computation and operators provided by vector-symbolic architectures (VSA). Instead of hard-coding the rule formulations associated with RPMs, our approach can learn the VSA rule formulations (hence the name Learn-VRF) with just one pass through the training data. Yet, our approach, with compact parameters, remains transparent and interpretable. Learn-VRF yields accurate predictions on I-RAVEN's indistribution data, and exhibits strong out-of-distribution capabilities concerning unseen attribute-rule pairs, significantly outperforming pure connectionist baselines including large language models. Our code is available at https://github.com/
The minimal computational substrate of fluid intelligence
Nelson, Amy PK, Mole, Joe, Pombo, Guilherme, Gray, Robert J, Ruffle, James K, Chan, Edgar, Rees, Geraint E, Cipolotti, Lisa, Nachev, Parashkev
The quantification of cognitive powers rests on identifying a behavioural task that depends on them. Such dependence cannot be assured, for the powers a task invokes cannot be experimentally controlled or constrained a priori, resulting in unknown vulnerability to failure of specificity and generalisability. Evaluating a compact version of Raven's Advanced Progressive Matrices (RAPM), a widely used clinical test of fluid intelligence, we show that LaMa, a self-supervised artificial neural network trained solely on the completion of partially masked images of natural environmental scenes, achieves human-level test scores a prima vista, without any task-specific inductive bias or training. Compared with cohorts of healthy and focally lesioned participants, LaMa exhibits human-like variation with item difficulty, and produces errors characteristic of right frontal lobe damage under degradation of its ability to integrate global spatial patterns. LaMa's narrow training and limited capacity -- comparable to the nervous system of the fruit fly -- suggest RAPM may be open to computationally simple solutions that need not necessarily invoke abstract reasoning.
Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices
Małkiński, Mikołaj, Mańdziuk, Jacek
Abstract visual reasoning (AVR) domain encompasses problems solving which requires the ability to reason about relations among entities present in a given scene. While humans, generally, solve AVR tasks in a ``natural'' way, even without prior experience, this type of problems has proven difficult for current machine learning systems. The paper summarises recent progress in applying deep learning methods to solving AVR problems, as a proxy for studying machine intelligence. We focus on the most common type of AVR tasks -- the Raven's Progressive Matrices (RPMs) -- and provide a comprehensive review of the learning methods and deep neural models applied to solve RPMs, as well as, the RPM benchmark sets. Performance analysis of the state-of-the-art approaches to solving RPMs leads to formulation of certain insights and remarks on the current and future trends in this area. We conclude the paper by demonstrating how real-world problems can benefit from the discoveries of RPM studies.
Facebook's New AI System Can Pass Multiple-Choice Intelligence Tests
Recently, a team of researchers from Facebook AI and Tel Aviv University proposed an AI system that solves the multiple-choice intelligence test, Raven's Progressive Matrices. The proposed AI system is a neural network model that combines multiple advances in generative models, including employing multiple pathways through the same network. Raven's Progressive Matrices, also known as Raven's Matrices, are multiple-choice intelligence tests. The test is used to measure abstract reasoning and is regarded as a non-verbal estimate of fluid intelligence. In this test, a person tries to finish the missing location in a 3X3 grid of abstract images.
Solving Raven's Progressive Matrices with Multi-Layer Relation Networks
Jahrens, Marius, Martinetz, Thomas
Raven's Progressive Matrices are a benchmark originally designed to test the cognitive abilities of humans. It has recently been adapted to test relational reasoning in machine learning systems. For this purpose the so-called Procedurally Generated Matrices dataset was set up, which is so far one of the most difficult relational reasoning benchmarks. Here we show that deep neural networks are capable of solving this benchmark, reaching an accuracy of 98.0 percent over the previous state-of-the-art of 62.6 percent by combining Wild Relation Networks with Multi-Layer Relation Networks and introducing Magnitude Encoding, an encoding scheme designed for late fusion architectures.
The Structural Affinity Method for Solving the Raven's Progressive Matrices Test for Intelligence
Shegheva, Snejana (Georgia Institute of Technology) | Goel, Ashok (Georgia Institute of Technology)
Graphical models offer techniques for capturing the structure of many problems in real-world domains and provide means for representation, interpretation, and inference. The modeling framework provides tools for discovering rules for solving problems by exploring structural relationships. We present the Structural Affinity method that uses graphical models for first learning and subsequently recognizing the pattern for solving problems on the Raven's Progressive Matrices Test of general human intelligence. Recently there has been considerable work on computational models of addressing the Raven's test using various representations ranging from fractals to symbolic structures. In contrast, our method uses Markov Random Fields parameterized by affinity factors to discover the structure in the geometric analogy problems and induce the rules of Carpenter et al.'s cognitive model of problem-solving on the Raven's Progressive Matrices Test. We provide a computational account that first learns the structure of a Raven's problem and then predicts the solution by computing the probability of the correct answer by recognizing patterns corresponding to Carpenter et al.'s rules. We demonstrate that the performance of our model on the Standard Raven Progressive Matrices is comparable with existing state of the art models.
AI scores higher than the average person on standard test
Artificial intelligence can now outperform humans on a standard intelligence test. A new computational model scores within the 75th percentile, better than the average person, on a test known as Raven's Progressive Matrices. Researchers say this demonstrates that it can take on abstract visual reasoning tasks, and is a major step toward AI that can see and understand the world the way we do. Using Raven's Progressive Matrices, a nonverbal standardized test that measures abstract reasoning, the team found that their model is not only on par with humans, but performs better than many. In this example, participants choose which shape should come next in the sequence.
Automatic Generation of Raven’s Progressive Matrices
Wang, Ke (University of California, Davis) | Su, Zhendong (University of California, Davis)
Raven’s Progressive Matrices (RPMs) are a popular family of general intelligence tests, and provide a non-verbal measure of a test subject’s reasoning abilities. Traditionally RPMs have been manually designed. To make them readily available for both practice and examination, we tackle the problem of automatically synthesizing RPMs. Our goal is to efficiently generate a large number of RPMs that are authentic (i.e. similar to manually written problems), interesting (i.e. diverse in terms of difficulty), and well-formed (i.e unambiguous). The main technical challenges are: How to formalize RPMs to accommodate their seemingly enormous diversity, and how to define and enforce their validity? To this end, we (1) introduce an abstract representation of RPMs using first-order logic, and (2) restrict instantiations to only valid RPMs. We have realized our approach and evaluated its efficiency and effectiveness. We show that our system can generate hundreds of valid problems per second with varying levels of difficulty. More importantly, we show, via a user study with 24 participants, that the generated problems are statistically indistinguishable from actual problems. This work is an exciting instance of how logic and reasoning may aid general learning.