Campbell, Declan
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Ku, Alexander, Campbell, Declan, Bai, Xuechunzi, Geng, Jiayi, Liu, Ryan, Marjieh, Raja, McCoy, R. Thomas, Nam, Andrew, Sucholutsky, Ilia, Veselovsky, Veniamin, Zhang, Liyi, Zhu, Jian-Qiao, Griffiths, Thomas L.
Modern artificial intelligence systems, such as large language models, are increasingly powerful but also increasingly hard to understand. Recognizing this problem as analogous to the historical difficulties in understanding the human mind, we argue that methods developed in cognitive science can be useful for understanding large language models. We propose a framework for applying these methods based on Marr's three levels of analysis. By revisiting established cognitive science techniques relevant to each level and illustrating their potential to yield insights into the behavior and internal organization of large language models, we aim to provide a toolkit for making sense of these new kinds of minds.
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
Yang, Yukang, Campbell, Declan, Huang, Kaixuan, Wang, Mengdi, Cohen, Jonathan, Webb, Taylor
Many recent studies have found evidence for emergent reasoning capabilities in large language models, but debate persists concerning the robustness of these capabilities, and the extent to which they depend on structured reasoning mechanisms. To shed light on these issues, we perform a comprehensive study of the internal mechanisms that support abstract rule induction in an open-source language model (Llama3-70B). We identify an emergent symbolic architecture that implements abstract reasoning via a series of three computations. In early layers, symbol abstraction heads convert input tokens to abstract variables based on the relations between those tokens. In intermediate layers, symbolic induction heads perform sequence induction over these abstract variables. Finally, in later layers, retrieval heads predict the next token by retrieving the value associated with the predicted abstract variable. These results point toward a resolution of the longstanding debate between symbolic and neural network approaches, suggesting that emergent reasoning in neural networks depends on the emergence of symbolic mechanisms.
Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem
Campbell, Declan, Rane, Sunayana, Giallanza, Tyler, De Sabbata, Nicolò, Ghods, Kia, Joshi, Amogh, Ku, Alexander, Frankland, Steven M., Griffiths, Thomas L., Cohen, Jonathan D., Webb, Taylor W.
Recent work has documented striking heterogeneity in the performance of state-of-the-art vision language models (VLMs), including both multimodal language models and text-to-image models. These models are able to describe and generate a diverse array of complex, naturalistic images, yet they exhibit surprising failures on basic multi-object reasoning tasks -- such as counting, localization, and simple forms of visual analogy -- that humans perform with near perfect accuracy. To better understand this puzzling pattern of successes and failures, we turn to theoretical accounts of the binding problem in cognitive science and neuroscience, a fundamental problem that arises when a shared set of representational resources must be used to represent distinct entities (e.g., to represent multiple objects in an image), necessitating the use of serial processing to avoid interference. We find that many of the puzzling failures of state-of-the-art VLMs can be explained as arising due to the binding problem, and that these failure modes are strikingly similar to the limitations exhibited by rapid, feedforward processing in the human brain.
Using Contrastive Learning with Generative Similarity to Learn Spaces that Capture Human Inductive Biases
Marjieh, Raja, Kumar, Sreejan, Campbell, Declan, Zhang, Liyi, Bencomo, Gianluca, Snell, Jake, Griffiths, Thomas L.
Humans rely on strong inductive biases to learn from few examples and abstract useful information from sensory data. Instilling such biases in machine learning models has been shown to improve their performance on various benchmarks including few-shot learning, robustness, and alignment. However, finding effective training procedures to achieve that goal can be challenging as psychologically-rich training data such as human similarity judgments are expensive to scale, and Bayesian models of human inductive biases are often intractable for complex, realistic domains. Here, we address this challenge by introducing a Bayesian notion of generative similarity whereby two datapoints are considered similar if they are likely to have been sampled from the same distribution. This measure can be applied to complex generative processes, including probabilistic programs. We show that generative similarity can be used to define a contrastive learning objective even when its exact form is intractable, enabling learning of spatial embeddings that express specific inductive biases. We demonstrate the utility of our approach by showing how it can be used to capture human inductive biases for geometric shapes, and to better distinguish different abstract drawing styles that are parameterized by probabilistic programs.
A Relational Inductive Bias for Dimensional Abstraction in Neural Networks
Campbell, Declan, Cohen, Jonathan D.
The human cognitive system exhibits remarkable flexibility and generalization capabilities, partly due to its ability to form low-dimensional, compositional representations of the environment. In contrast, standard neural network architectures often struggle with abstract reasoning tasks, overfitting, and requiring extensive data for training. This paper investigates the impact of the relational bottleneck -- a mechanism that focuses processing on relations among inputs -- on the learning of factorized representations conducive to compositional coding and the attendant flexibility of processing. We demonstrate that such a bottleneck not only improves generalization and learning efficiency, but also aligns network performance with human-like behavioral biases. Networks trained with the relational bottleneck developed orthogonal representations of feature dimensions latent in the dataset, reflecting the factorized structure thought to underlie human cognitive flexibility. Moreover, the relational network mimics human biases towards regularity without pre-specified symbolic primitives, suggesting that the bottleneck fosters the emergence of abstract representations that confer flexibility akin to symbols.
Human-Like Geometric Abstraction in Large Pre-trained Neural Networks
Campbell, Declan, Kumar, Sreejan, Giallanza, Tyler, Griffiths, Thomas L., Cohen, Jonathan D.
Specifically, we apply that can capture regularities in the external world. By neural network models to behavioral tasks from recent empirical forming abstractions that can generalize to future experience, work (Sablé-Meyer et al., 2021, 2022; Hsu, Wu, & humans are able to exhibit efficient learning and strong generalization Goodman, 2022) that catalogue three effects indicative of abstraction across domains (Lake, Salakhutdinov, & Tenenbaum, in human geometric reasoning. First, humans are 2015; Hull, 1920). One domain in which this has sensitive to geometric complexity, such that they are slower been observed by cognitive scientists is geometric reasoning to recall complex images as compared to simpler ones (Sablé- (Dehaene, Al Roumi, Lakretz, Planton, & Sablé-Meyer, Meyer et al., 2022). Second, humans are sensitive to geometric 2022), where people consistently extract abstract concepts, regularity (based on features such as right angles, parallel such as parallelism, symmetry, and convexity, that generalize sides, and symmetry) such that they are able to classify regular across many visual instances.
Comparing Abstraction in Humans and Large Language Models Using Multimodal Serial Reproduction
Kumar, Sreejan, Marjieh, Raja, Zhang, Byron, Campbell, Declan, Hu, Michael Y., Bhatt, Umang, Lake, Brenden, Griffiths, Thomas L.
Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a paradigm similar to the game of telephone, where one person observes a stimulus and reproduces it for the next to form a chain of reproductions. Past serial reproduction experiments typically employ a single sensory modality, but humans often communicate abstractions of the world to each other through language. To investigate the effect language on the formation of abstractions, we implement a novel multimodal serial reproduction framework by asking people who receive a visual stimulus to reproduce it in a linguistic format, and vice versa. We ran unimodal and multimodal chains with both humans and GPT-4 and find that adding language as a modality has a larger effect on human reproductions than GPT-4's. This suggests human visual and linguistic representations are more dissociable than those of GPT-4.
The Relational Bottleneck as an Inductive Bias for Efficient Abstraction
Webb, Taylor W., Frankland, Steven M., Altabaa, Awni, Krishnamurthy, Kamesh, Campbell, Declan, Russin, Jacob, O'Reilly, Randall, Lafferty, John, Cohen, Jonathan D.
A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This effort has often been framed in terms of a dichotomy between connectionist and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck. We review a family of models that employ this approach to induce abstractions in a data-efficient manner, emphasizing their potential as candidate models for the acquisition of abstract concepts in the human mind and brain.