Goto

Collaborating Authors

 Kunda, Maithilee


A Neurodiversity-Inspired Solver for the Abstraction \& Reasoning Corpus (ARC) Using Visual Imagery and Program Synthesis

arXiv.org Artificial Intelligence

Core knowledge about physical objects -- e.g., their permanency, spatial transformations, and interactions -- is one of the most fundamental building blocks of biological intelligence across humans and non-human animals. While AI techniques in certain domains (e.g. vision, NLP) have advanced dramatically in recent years, no current AI systems can yet match human abilities in flexibly applying core knowledge to solve novel tasks. We propose a new AI approach to core knowledge that combines 1) visual representations of core knowledge inspired by human mental imagery abilities, especially as observed in studies of neurodivergent individuals; with 2) tree-search-based program synthesis for flexibly combining core knowledge to form new reasoning strategies on the fly. We demonstrate our system's performance on the very difficult Abstraction \& Reasoning Corpus (ARC) challenge, and we share experimental results from publicly available ARC items as well as from our 4th-place finish on the private test set during the 2022 global ARCathon challenge.


A Cognitively-Inspired Neural Architecture for Visual Abstract Reasoning Using Contrastive Perceptual and Conceptual Processing

arXiv.org Artificial Intelligence

We introduce a new neural architecture for solving visual abstract reasoning tasks inspired by human cognition, specifically by observations that human abstract reasoning often interleaves perceptual and conceptual processing as part of a flexible, iterative, and dynamic cognitive process. Inspired by this principle, our architecture models visual abstract reasoning as an iterative, self-contrasting learning process that pursues consistency between perceptual and conceptual processing of visual stimuli. We explain how this new Contrastive Perceptual-Conceptual Network (CPCNet) works using matrix reasoning problems in the style of the well-known Raven's Progressive Matrices intelligence test. Experiments on the machine learning dataset RAVEN show that CPCNet achieves higher accuracy than all previously published models while also using the weakest inductive bias. We also point out a substantial and previously unremarked class imbalance in the original RAVEN dataset, and we propose a new variant of RAVEN -- AB-RAVEN -- that is more balanced in terms of abstract concepts.


A Computational Account Of Self-Supervised Visual Learning From Egocentric Object Play

arXiv.org Artificial Intelligence

Research in child development has shown that embodied experience handling physical objects contributes to many cognitive abilities, including visual learning. One characteristic of such experience is that the learner sees the same object from several different viewpoints. In this paper, we study how learning signals that equate different viewpoints -- e.g., assigning similar representations to different views of a single object -- can support robust visual learning. We use the Toybox dataset, which contains egocentric videos of humans manipulating different objects, and conduct experiments using a computer vision framework for self-supervised contrastive learning. We find that representations learned by equating different physical viewpoints of an object benefit downstream image classification accuracy. Further experiments show that this performance improvement is robust to variations in the gaps between viewpoints, and that the benefits transfer to several different image classification tasks.


Deep Non-Monotonic Reasoning for Visual Abstract Reasoning Tasks

arXiv.org Artificial Intelligence

While achieving unmatched performance on many well-defined tasks, deep learning models have also been used to solve visual abstract reasoning tasks, which are relatively less well-defined, and have been widely used to measure human intelligence. However, current deep models struggle to match human abilities to solve such tasks with minimum data but maximum generalization. One limitation is that current deep learning models work in a monotonic way, i.e., treating different parts of the input in essentially fixed orderings, whereas people repeatedly observe and reason about the different parts of the visual stimuli until the reasoning process converges to a consistent conclusion, i.e., non-monotonic reasoning. This paper proposes a non-monotonic computational approach to solve visual abstract reasoning tasks. In particular, we implemented a deep learning model using this approach and tested it on the RAVEN dataset -- a dataset inspired by the Raven's Progressive Matrices test. Results show that the proposed approach is more effective than existing monotonic deep learning models, under strict experimental settings that represent a difficult variant of the RAVEN dataset problem.


Automatic Item Generation of Figural Analogy Problems: A Review and Outlook

arXiv.org Artificial Intelligence

Figural analogy problems have long been a widely used format in human intelligence tests. In the past four decades, more and more research has investigated automatic item generation for figural analogy problems, i.e., algorithmic approaches for systematically and automatically creating such problems. In cognitive science and psychometrics, this research can deepen our understandings of human analogical ability and psychometric properties of figural analogies. With the recent development of data-driven AI models for reasoning about figural analogies, the territory of automatic item generation of figural analogies has further expanded. This expansion brings new challenges as well as opportunities, which demand reflection on previous item generation research and planning future studies. This paper reviews the important works of automatic item generation of figural analogies for both human intelligence tests and data-driven AI models. From an interdisciplinary perspective, the principles and technical details of these works are analyzed and compared, and desiderata for future research are suggested.


The AI Triplet: Computational, Conceptual, and Mathematical Representations in AI Education

arXiv.org Artificial Intelligence

Expertise in AI requires integrating computational, conceptual, and mathematical knowledge and representations. We propose this trifecta as an "AI triplet," similar in spirit to the "chemistry triplet" that has influenced the past four decades of chemistry education. We describe a rationale for this triplet and how it maps onto topics commonly taught in AI courses, such as tree search and gradient descent. Also, similar to impacts of the chemistry triplet on chemistry education, we suggest an initial example of how considering the AI triplet may help pinpoint obstacles in AI education, i.e., how student learning might be scaffolded to approach expert-level flexibility in moving between the points of the triplet.


Creative Captioning: An AI Grand Challenge Based on the Dixit Board Game

arXiv.org Artificial Intelligence

We propose a new class of "grand challenge" AI problems that we call creative captioning---generating clever, interesting, or abstract captions for images, as well as understanding such captions. Creative captioning draws on core AI research areas of vision, natural language processing, narrative reasoning, and social reasoning, and across all these areas, it requires sophisticated uses of common sense and cultural knowledge. In this paper, we analyze several specific research problems that fall under creative captioning, using the popular board game Dixit as both inspiration and proposed testing ground. We expect that Dixit could serve as an engaging and motivating benchmark for creative captioning across numerous AI research communities for the coming 1-2 decades.


Quantifying Human Behavior on the Block Design Test Through Automated Multi-Level Analysis of Overhead Video

arXiv.org Artificial Intelligence

The block design test is a standardized, widely used neuropsychological assessment of visuospatial reasoning that involves a person recreating a series of given designs out of a set of colored blocks. In current testing procedures, an expert neuropsychologist observes a person's accuracy and completion time as well as overall impressions of the person's problem-solving procedures, errors, etc., thus obtaining a holistic though subjective and often qualitative view of the person's cognitive processes. We propose a new framework that combines room sensors and AI techniques to augment the information available to neuropsychologists from block design and similar tabletop assessments. In particular, a ceiling-mounted camera captures an overhead view of the table surface. From this video, we demonstrate how automated classification using machine learning can produce a frame-level description of the state of the block task and the person's actions over the course of each test problem. We also show how a sequence-comparison algorithm can classify one individual's problem-solving strategy relative to a database of simulated strategies, and how these quantitative results can be visualized for use by neuropsychologists.


Thinking in PolAR Pictures: Using Rotation-Friendly Mental Images to Solve Leiter-R Form Completion

AAAI Conferences

The Leiter International Performance Scale-Revised (Leiter-R) is a standardized cognitive test that seeks to "provide a nonverbal measure of general intelligence by sampling a wide variety of functions from memory to nonverbal reasoning." Understanding the computational building blocks of nonverbal cognition, as measured by the Leiter-R, is an important step towards understanding human nonverbal cognition, especially with respect to typical and atypical trajectories of child development. One subtest of the Leiter-R, Form Completion, involves synthesizing and localizing a visual figure from its constituent slices. Form Completion poses an interesting nonverbal problem that seems to combine several aspects of visual memory, mental rotation, and visual search. We describe a new computational cognitive model that addresses Form Completion using a novel, mental-rotation-friendly image representation that we call the Polar Augmented Resolution (PolAR) Picture, which enables high-fidelity mental rotation operations. We present preliminary results using actual Leiter-R test items and discuss directions for future work.


Understanding the Role of Visual Mental Imagery in Intelligence: The Retinotopic Reasoning (R2) Cognitive Architecture

AAAI Conferences

This paper presents a new Retinotopic Reasoning (R2) cognitive architecture that is inspired by studies of visual mental imagery in people. R2 is a hybrid symbolic-connectionist architecture, with certain components of the system represented in propositional, symbolic form, but with a primary working memory store that contains visual ``mental'' images that can be created and manipulated by the system. R2 is not intended to serve as a full-fledged, stand-alone cognitive architecture, but rather is a specialized system focusing on how visual mental imagery can be represented, learned, and used in support of intelligent behavior. Examples illustrate how R2 can be used to model human visuospatial cognition on several different standardized cognitive tests, including the Raven's Progressive Matrices test, the Block Design test, the Embedded Figures test, and the Paper Folding test.