Goto

Collaborating Authors

 Dayan, Peter


Complexity in Complexity: Understanding Visual Complexity Through Structure, Color, and Surprise

arXiv.org Artificial Intelligence

Understanding human perception of visual complexity is crucial in visual cognition. Recently (Shen, et al. 2024) proposed an interpretable segmentation-based model that accurately predicted complexity across various datasets, supporting the idea that complexity can be explained simply. In this work, we investigate the failure of their model to capture structural, color and surprisal contributions to complexity. To this end, we propose Multi-Scale Sobel Gradient which measures spatial intensity variations, Multi-Scale Unique Color which quantifies colorfulness across multiple scales, and surprise scores generated using a Large Language Model. We test our features on existing benchmarks and a novel dataset containing surprising images from Visual Genome. Our experiments demonstrate that modeling complexity accurately is not as simple as previously thought, requiring additional perceptual and semantic factors to address dataset biases. Thus our results offer deeper insights into how humans assess visual complexity.


Centaur: a foundation model of human cognition

arXiv.org Artificial Intelligence

Establishing a unified theory of cognition has been a major goal of psychology. While there have been previous attempts to instantiate such theories by building computational models, we currently do not have one model that captures the human mind in its entirety. Here we introduce Centaur, a computational model that can predict and simulate human behavior in any experiment expressible in natural language. We derived Centaur by finetuning a state-of-the-art language model on a novel, large-scale data set called Psych-101. Psych-101 reaches an unprecedented scale, covering trial-by-trial data from over 60,000 participants performing over 10,000,000 choices in 160 experiments. Centaur not only captures the behavior of held-out participants better than existing cognitive models, but also generalizes to new cover stories, structural task modifications, and entirely new domains. Furthermore, we find that the model's internal representations become more aligned with human neural activity after finetuning. Taken together, Centaur is the first real candidate for a unified model of human cognition. We anticipate that it will have a disruptive impact on the cognitive sciences, challenging the existing paradigm for developing computational models.


Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences

arXiv.org Artificial Intelligence

Humans excel at learning abstract patterns across different sequences, filtering out irrelevant details, and transferring these generalized concepts to new sequences. In contrast, many sequence learning models lack the ability to abstract, which leads to memory inefficiency and poor transfer. We introduce a non-parametric hierarchical variable learning model (HVM) that learns chunks from sequences and abstracts contextually similar chunks as variables. HVM efficiently organizes memory while uncovering abstractions, leading to compact sequence representations. When learning on language datasets such as babyLM, HVM learns a more efficient dictionary than standard compression algorithms such as Lempel-Ziv. In a sequence recall task requiring the acquisition and transfer of variables embedded in sequences, we demonstrate HVM's sequence likelihood correlates with human recall times. In contrast, large language models (LLMs) struggle to transfer abstract variables as effectively as humans. From HVM's adjustable layer of abstraction, we demonstrate that the model realizes a precise trade-off between compression and generalization. Our work offers a cognitive model that captures the learning and transfer of abstract representations in human cognition and differentiates itself from the behavior of large language models.


Next state prediction gives rise to entangled, yet compositional representations of objects

arXiv.org Artificial Intelligence

A BSTRACT Compositional representations are thought to enable humans to generalize across combinatorially vast state spaces. Models with learnable object slots, which encode information about objects in separate latent codes, have shown promise for this type of generalization but rely on strong architectural priors. Models with distributed representations, on the other hand, use overlapping, potentially entangled neural codes, and their ability to support compositional generalization remains underexplored. In this paper we examine whether distributed models can develop linearly separable representations of objects, like slotted models, through unsupervised training on videos of object interactions. We show that, surprisingly, models with distributed representations often match or outperform models with object slots in downstream prediction tasks. Furthermore, we find that linearly separable object representations can emerge without object-centric priors, with auxiliary objectives like next-state prediction playing a key role. Finally, we observe that distributed models' object representations are never fully disentangled, even if they are linearly separable: Multiple objects can be encoded through partially overlapping neural populations while still being highly separable with a linear classifier. We hypothesize that maintaining partially shared codes enables distributed models to better compress object dynamics, potentially enhancing generalization. 1 I NTRODUCTION Humans naturally decompose scenes, events and processes in terms of the objects that feature in them (Tenenbaum et al., 2011; Lake et al., 2017). These object-centric construals have been argued to explain humans' ability to reason and generalize successfully (Goodman et al., 2008; Lake et al., 2015; Schulze Buschoff et al., 2023). It has therefore long been a chief aim in machine learning research to design models and agents that learn to represent the world compositionally, e.g. in terms of the building blocks that compose it.


Characterising the Creative Process in Humans and Large Language Models

arXiv.org Artificial Intelligence

Large language models appear quite creative, often performing on par with the average human on creative tasks. However, research on LLM creativity has focused solely on \textit{products}, with little attention on the creative \textit{process}. Process analyses of human creativity often require hand-coded categories or exploit response times, which do not apply to LLMs. We provide an automated method to characterise how humans and LLMs explore semantic spaces on the Alternate Uses Task, and contrast with behaviour in a Verbal Fluency Task. We use sentence embeddings to identify response categories and compute semantic similarities, which we use to generate jump profiles. Our results corroborate earlier work in humans reporting both persistent (deep search in few semantic spaces) and flexible (broad search across multiple semantic spaces) pathways to creativity, where both pathways lead to similar creativity scores. LLMs were found to be biased towards either persistent or flexible paths, that varied across tasks. Though LLMs as a population match human profiles, their relationship with creativity is different, where the more flexible models score higher on creativity. Our dataset and scripts are available on \href{https://github.com/surabhisnath/Creative_Process}{GitHub}.


Simplicity in Complexity : Explaining Visual Complexity using Deep Segmentation Models

arXiv.org Artificial Intelligence

The complexity of visual stimuli plays an important role in many cognitive phenomena, including attention, engagement, memorability, time perception and aesthetic evaluation. Despite its importance, complexity is poorly understood and ironically, previous models of image complexity have been quite complex. There have been many attempts to find handcrafted features that explain complexity, but these features are usually dataset specific, and hence fail to generalise. On the other hand, more recent work has employed deep neural networks to predict complexity, but these models remain difficult to interpret, and do not guide a theoretical understanding of the problem. Here we propose to model complexity using segment-based representations of images. We use state-of-the-art segmentation models, SAM and FC-CLIP, to quantify the number of segments at multiple granularities, and the number of classes in an image respectively. We find that complexity is well-explained by a simple linear model with these two features across six diverse image-sets of naturalistic scene and art images. This suggests that the complexity of images can be surprisingly simple.


Detecting and Deterring Manipulation in a Cognitive Hierarchy

arXiv.org Artificial Intelligence

Social agents with finitely nested opponent models are vulnerable to manipulation by agents with deeper reasoning and more sophisticated opponent modelling. This imbalance, rooted in logic and the theory of recursive modelling frameworks, cannot be solved directly. We propose a computational framework, $\aleph$-IPOMDP, augmenting model-based RL agents' Bayesian inference with an anomaly detection algorithm and an out-of-belief policy. Our mechanism allows agents to realize they are being deceived, even if they cannot understand how, and to deter opponents via a credible threat. We test this framework in both a mixed-motive and zero-sum game. Our results show the $\aleph$ mechanism's effectiveness, leading to more equitable outcomes and less exploitation by more sophisticated agents. We discuss implications for AI safety, cybersecurity, cognitive science, and psychiatry.


Predicting the Future with Simple World Models

arXiv.org Artificial Intelligence

World models can represent potentially high-dimensional pixel observations in compact latent spaces, making it tractable to model the dynamics of the environment. However, the latent dynamics inferred by these models may still be highly complex. Abstracting the dynamics of the environment with simple models can have several benefits. If the latent dynamics are simple, the model may generalize better to novel transitions, and discover useful latent representations of environment states. We propose a regularization scheme that simplifies the world model's latent dynamics. Our model, the Parsimonious Latent Space Model (PLSM), minimizes the mutual information between latent states and the dynamics that arise between them. This makes the dynamics softly state-invariant, and the effects of the agent's actions more predictable. We combine the PLSM with three different model classes used for i) future latent state prediction, ii) video prediction, and iii) planning. We find that our regularization improves accuracy, generalization, and performance in downstream tasks.


The Inner Sentiments of a Thought

arXiv.org Artificial Intelligence

Transformer-based large-scale language models (LLMs) are able to generate highly realistic text. They are duly able to express, and at least implicitly represent, a wide range of sentiments and color, from the obvious, such as valence and arousal to the subtle, such as determination and admiration. We provide a first exploration of these representations and how they can be used for understanding the inner sentimental workings of single sentences. We train predictors of the quantiles of the distributions of final sentiments of sentences from the hidden representations of an LLM applied to prefixes of increasing lengths. After showing that predictors of distributions of valence, determination, admiration, anxiety and annoyance are well calibrated, we provide examples of using these predictors for analyzing sentences, illustrating, for instance, how even ordinary conjunctions (e.g., "but") can dramatically alter the emotional trajectory of an utterance. We then show how to exploit the distributional predictions to generate sentences with sentiments in the tails of distributions. We discuss the implications of our results for the inner workings of thoughts, for instance for psychiatric dysfunction.


Habits of Mind: Reusing Action Sequences for Efficient Planning

arXiv.org Artificial Intelligence

When we exercise sequences of actions, their execution becomes more fluent and precise. Here, we consider the possibility that exercised action sequences can also be used to make planning faster and more accurate by focusing expansion of the search tree on paths that have been frequently used in the past, and by reducing deep planning problems to shallow ones via multi-step jumps in the tree. To capture such sequences, we use a flexible Bayesian action chunking mechanism which finds and exploits statistically reliable structure at different scales. This gives rise to shorter or longer routines that can be embedded into a Monte-Carlo tree search planner. We show the benefits of this scheme using a physical construction task patterned after tangrams.