Wang, Jane X.
A social path to human-like artificial intelligence
Duéñez-Guzmán, Edgar A., Sadedin, Suzanne, Wang, Jane X., McKee, Kevin R., Leibo, Joel Z.
Traditionally, cognitive and computer scientists have viewed intelligence solipsistically, as a property of unitary agents devoid of social context. Given the success of contemporary learning algorithms, we argue that the bottleneck in artificial intelligence (AI) progress is shifting from data assimilation to novel data generation. We bring together evidence showing that natural intelligence emerges at multiple scales in networks of interacting agents via collective living, social relationships and major evolutionary transitions, which contribute to novel data generation through mechanisms such as population pressures, arms races, Machiavellian selection, social learning and cumulative culture. Many breakthroughs in AI exploit some of these processes, from multi-agent structures enabling algorithms to master complex games like Capture-The-Flag and StarCraft II, to strategic communication in Diplomacy and the shaping of AI data streams by other AIs. Moving beyond a solipsistic view of agency to integrate these mechanisms suggests a path to human-like compounding innovation through ongoing novel data generation.
Scaling Instructable Agents Across Many Simulated Worlds
SIMA Team, null, Raad, Maria Abi, Ahuja, Arun, Barros, Catarina, Besse, Frederic, Bolt, Andrew, Bolton, Adrian, Brownfield, Bethanie, Buttimore, Gavin, Cant, Max, Chakera, Sarah, Chan, Stephanie C. Y., Clune, Jeff, Collister, Adrian, Copeman, Vikki, Cullum, Alex, Dasgupta, Ishita, de Cesare, Dario, Di Trapani, Julia, Donchev, Yani, Dunleavy, Emma, Engelcke, Martin, Faulkner, Ryan, Garcia, Frankie, Gbadamosi, Charles, Gong, Zhitao, Gonzales, Lucy, Gupta, Kshitij, Gregor, Karol, Hallingstad, Arne Olav, Harley, Tim, Haves, Sam, Hill, Felix, Hirst, Ed, Hudson, Drew A., Hudson, Jony, Hughes-Fitt, Steph, Rezende, Danilo J., Jasarevic, Mimi, Kampis, Laura, Ke, Rosemary, Keck, Thomas, Kim, Junkyung, Knagg, Oscar, Kopparapu, Kavya, Lampinen, Andrew, Legg, Shane, Lerchner, Alexander, Limont, Marjorie, Liu, Yulan, Loks-Thompson, Maria, Marino, Joseph, Cussons, Kathryn Martin, Matthey, Loic, Mcloughlin, Siobhan, Mendolicchio, Piermaria, Merzic, Hamza, Mitenkova, Anna, Moufarek, Alexandre, Oliveira, Valeria, Oliveira, Yanko, Openshaw, Hannah, Pan, Renke, Pappu, Aneesh, Platonov, Alex, Purkiss, Ollie, Reichert, David, Reid, John, Richemond, Pierre Harvey, Roberts, Tyson, Ruscoe, Giles, Elias, Jaume Sanchez, Sandars, Tasha, Sawyer, Daniel P., Scholtes, Tim, Simmons, Guy, Slater, Daniel, Soyer, Hubert, Strathmann, Heiko, Stys, Peter, Tam, Allison C., Teplyashin, Denis, Terzi, Tayfun, Vercelli, Davide, Vujatovic, Bojan, Wainwright, Marcus, Wang, Jane X., Wang, Zhengdong, Wierstra, Daan, Williams, Duncan, Wong, Nathaniel, York, Sarah, Young, Nick
Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as open-ended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games.
CogBench: a large language model walks into a psychology lab
Coda-Forno, Julian, Binz, Marcel, Wang, Jane X., Schulz, Eric
Large language models (LLMs) have significantly advanced the field of artificial intelligence. Yet, evaluating them comprehensively remains challenging. We argue that this is partly due to the predominant focus on performance metrics in most benchmarks. This paper introduces CogBench, a benchmark that includes ten behavioral metrics derived from seven cognitive psychology experiments. This novel approach offers a toolkit for phenotyping LLMs' behavior. We apply CogBench to 35 LLMs, yielding a rich and diverse dataset. We analyze this data using statistical multilevel modeling techniques, accounting for the nested dependencies among fine-tuned versions of specific LLMs. Our study highlights the crucial role of model size and reinforcement learning from human feedback (RLHF) in improving performance and aligning with human behavior. Interestingly, we find that open-source models are less risk-prone than proprietary models and that fine-tuning on code does not necessarily enhance LLMs' behavior. Finally, we explore the effects of prompt-engineering techniques. We discover that chain-of-thought prompting improves probabilistic reasoning, while take-a-step-back prompting fosters model-based behaviors.
Meta-in-context learning in large language models
Coda-Forno, Julian, Binz, Marcel, Akata, Zeynep, Botvinick, Matthew, Wang, Jane X., Schulz, Eric
Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success. In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learning itself. We coin this phenomenon meta-in-context learning. Looking at two idealized domains, a one-dimensional regression task and a two-armed bandit task, we show that meta-in-context learning adaptively reshapes a large language model's priors over expected tasks. Furthermore, we find that meta-in-context learning modifies the in-context learning strategies of such models. Finally, we extend our approach to a benchmark of real-world regression problems where we observe competitive performance to traditional learning algorithms. Taken together, our work improves our understanding of in-context learning and paves the way toward adapting large language models to the environment they are applied purely through meta-in-context learning rather than traditional finetuning.
Semantic Exploration from Language Abstractions and Pretrained Representations
Tam, Allison C., Rabinowitz, Neil C., Lampinen, Andrew K., Roy, Nicholas A., Chan, Stephanie C. Y., Strouse, DJ, Wang, Jane X., Banino, Andrea, Hill, Felix
Effective exploration is a challenge in reinforcement learning (RL). Novelty-based exploration methods can suffer in high-dimensional state spaces, such as continuous partially-observable 3D environments. We address this challenge by defining novelty using semantically meaningful state abstractions, which can be found in learned representations shaped by natural language. In particular, we evaluate vision-language representations, pretrained on natural image captioning datasets. We show that these pretrained representations drive meaningful, task-relevant exploration and improve performance on 3D simulated environments. We also characterize why and how language provides useful abstractions for exploration by considering the impacts of using representations from a pretrained model, a language oracle, and several ablations. We demonstrate the benefits of our approach with on-and off-policy RL algorithms and in two very different task domains-- one that stresses the identification and manipulation of everyday objects, and one that requires navigational exploration in an expansive world. Our results suggest that using language-shaped representations could improve exploration for various algorithms and agents in challenging environments.
Meta-Learned Models of Cognition
Binz, Marcel, Dasgupta, Ishita, Jagadish, Akshay, Botvinick, Matthew, Wang, Jane X., Schulz, Eric
Meta-learning is a framework for learning learning algorithms through repeated interactions with an environment as opposed to designing them by hand. In recent years, this framework has established itself as a promising tool for building models of human cognition. Yet, a coherent research program around meta-learned models of cognition is still missing. The purpose of this article is to synthesize previous work in this field and establish such a research program. We rely on three key pillars to accomplish this goal. We first point out that meta-learning can be used to construct Bayes-optimal learning algorithms. This result not only implies that any behavioral phenomenon that can be explained by a Bayesian model can also be explained by a meta-learned model but also allows us to draw strong connections to the rational analysis of cognition. We then discuss several advantages of the meta-learning framework over traditional Bayesian methods. In particular, we argue that meta-learning can be applied to situations where Bayesian inference is impossible and that it enables us to make rational models of cognition more realistic, either by incorporating limited computational resources or neuroscientific knowledge. Finally, we reexamine prior studies from psychology and neuroscience that have applied meta-learning and put them into the context of these new insights. In summary, our work highlights that meta-learning considerably extends the scope of rational analysis and thereby of cognitive theories more generally.
Data Distributional Properties Drive Emergent In-Context Learning in Transformers
Chan, Stephanie C. Y., Santoro, Adam, Lampinen, Andrew K., Wang, Jane X., Singh, Aaditya, Richemond, Pierre H., McClelland, Jay, Hill, Felix
Large transformer-based models are able to perform in-context few-shot learning, without being explicitly trained for it. This observation raises the question: what aspects of the training regime lead to this emergent behavior? Here, we show that this behavior is driven by the distributions of the training data itself. In-context learning emerges when the training data exhibits particular distributional properties such as burstiness (items appear in clusters rather than being uniformly distributed over time) and having large numbers of rarely occurring classes. In-context learning also emerges more strongly when item meanings or interpretations are dynamic rather than fixed. These properties are exemplified by natural language, but are also inherent to naturalistic data in a wide range of other domains. They also depart significantly from the uniform, i.i.d. training distributions typically used for standard supervised learning. In our initial experiments, we found that in-context learning traded off against more conventional weight-based learning, and models were unable to achieve both simultaneously. However, our later experiments uncovered that the two modes of learning could co-exist in a single model when it was trained on data following a skewed Zipfian distribution -- another common property of naturalistic data, including language. In further experiments, we found that naturalistic data distributions were only able to elicit in-context learning in transformers, and not in recurrent models. In sum, our findings indicate how the transformer architecture works together with particular properties of the training data to drive the intriguing emergent in-context learning behaviour of large language models, and how future work might encourage both in-context and in-weights learning in domains beyond language.
Tell me why! -- Explanations support learning of relational and causal structure
Lampinen, Andrew K., Roy, Nicholas A., Dasgupta, Ishita, Chan, Stephanie C. Y., Tam, Allison C., McClelland, James L., Yan, Chen, Santoro, Adam, Rabinowitz, Neil C., Wang, Jane X., Hill, Felix
Explanations play a considerable role in human learning, especially in areas that remain major challenges for AI -- forming abstractions, and learning about the relational and causal structure of the world. Here, we explore whether reinforcement learning agents might likewise benefit from explanations. We outline a family of relational tasks that involve selecting an object that is the odd one out in a set (i.e., unique along one of many possible feature dimensions). Odd-one-out tasks require agents to reason over multi-dimensional relationships among a set of objects. We show that agents do not learn these tasks well from reward alone, but achieve >90% performance when they are also trained to generate language explaining object properties or why a choice is correct or incorrect. In further experiments, we show how predicting explanations enables agents to generalize appropriately from ambiguous, causally-confounded training, and even to meta-learn to perform experimental interventions to identify causal structure. We show that explanations help overcome the tendency of agents to fixate on simple features, and explore which aspects of explanations make them most beneficial. Our results suggest that learning from explanations is a powerful principle that could offer a promising path towards training more robust and general machine learning systems.
Alchemy: A structured task distribution for meta-reinforcement learning
Wang, Jane X., King, Michael, Porcel, Nicolas, Kurth-Nelson, Zeb, Zhu, Tina, Deck, Charlie, Choy, Peter, Cassin, Mary, Reynolds, Malcolm, Song, Francis, Buttimore, Gavin, Reichert, David P., Rabinowitz, Neil, Matthey, Loic, Hassabis, Demis, Lerchner, Alexander, Botvinick, Matthew
There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled analysis. In the present work, we introduce a new benchmark for meta-RL research, which combines structural richness with structural transparency. Alchemy is a 3D video game, implemented in Unity, which involves a latent causal structure that is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge. We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents. Results clearly indicate a frank and specific failure of meta-learning, providing validation for Alchemy as a challenging benchmark for meta-RL. Concurrent with this report, we are releasing Alchemy as public resource, together with a suite of analysis tools and sample agent trajectories.
Meta-learning in natural and artificial intelligence
Wang, Jane X.
Humans are remarkable for continuously learning throughout the entirety of their lives, from acquiring physical reasoning and language skills at a young age [64, 43], to the ability to reason about the detailed complexities inherent in everyday adult life. One key quality of this learning is that it happens at multiple scales, both in terms of time and abstraction, in a process termed meta-learning or learning to learn. The fundamental principle of meta-learning is that learning proceeds faster with more experience, via the acquisition of inductive biases or knowledge that allows for more efficient learning in the future [66, 59, 57]. These favorable properties of meta-learning have recently gained it considerable renewed interest within the deep learning/artificial intelligence community. Despite their tremendous successes in recent years [46, 61], deep learning systems still require many orders of magnitude of data than humans [40, 12]. Although early work demonstrated the feasibility for neural networks to discover their own learning rules [10, 58], it was only recently that the field has experienced a resurgence of new research in meta-learning using deep neural networks. This has demonstrated the wide-ranging potential of neural networks to meta-learn all aspects of the learning process. Deep neural networks are typically trained via backpropagation, which adjusts the weights of the neural network so that given a set of input data, the network outputs match some desired target outputs (e.g., classification labels).