Goto

Collaborating Authors

 blicket


Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?

GX-Chen, Anthony, Lin, Dongyan, Samiei, Mandana, Precup, Doina, Richards, Blake A., Fergus, Rob, Marino, Kenneth

arXiv.org Artificial Intelligence

Language model (LM) agents are increasingly used as autonomous decision-makers which need to actively gather information to guide their decisions. A crucial cognitive skill for such agents is the efficient exploration and understanding of the causal structure of the world -- key to robust, scientifically grounded reasoning. Yet, it remains unclear whether LMs possess this capability or exhibit systematic biases leading to erroneous conclusions. In this work, we examine LMs' ability to explore and infer causal relationships, using the well-established Blicket Test paradigm from developmental psychology. We find that LMs reliably infer the common, intuitive disjunctive causal relationships but systematically struggle with the unusual, yet equally (or sometimes even more) evidenced conjunctive ones. This "disjunctive bias" persists across model families, sizes, and prompting strategies, and performance further declines as task complexity increases. Interestingly, an analogous bias appears in human adults, suggesting that LMs may have inherited deep-seated reasoning heuristics from their training data. To this end, we quantify similarities between LMs and humans, finding that LMs exhibit adult-like inference profiles (but not child-like). Finally, we propose a test-time sampling method which explicitly samples and eliminates hypotheses about causal relationships from the LM. This scalable approach significantly reduces the disjunctive bias and moves LMs closer to the goal of scientific, causally rigorous reasoning.


A Appendix

Neural Information Processing Systems

A.1 Compute Usage The seven billion parameter language model we used as part of Frozen used model parallelism with To generate a 2-way question with n inner-shots, the following process is followed: 1. Sample two classes c "this is a dax" or "this is a blicket" accordingly 5. Select one of c Assign the truncated caption "this is a" to In 1. five distinct classes are sampled All images are stored at 224 224 resolution. To generate Real-Name miniImagenet, the same process is followed, except that in steps 4. and 6., "this is a dax"), the (first) class "this is a fruit bat"). For the evaluations in this paper, we again only take images from the validation set. In this work, we only consider 2-way Fast-VQA. To generate Guided-VQA, the same process is followed, except that in step 3. the (first) class name The Open-Ended miniImageNet, Real-Name miniImageneNet, Fast-VQA and Guided-VQA evaluations are available at https://fh295.github.io/frozen.html.


A Appendix

Neural Information Processing Systems

A.1 Compute Usage The seven billion parameter language model we used as part of Frozen used model parallelism with the strategy from [39] to partition one instance of the model over four accelerators. Each instance had a batch size of 8. To reach a batch size of 128 in this configuration, we additionally employed data parallelism with 16 synchronous replicas. The whole system was trained on a 4x8 TPUv3 [15] topology for about 12 hours, which is when validation set performance for Conceptual Captions led us to do early stopping. A.2 Frozen Architecture Details The pretrained transformer language model we used has a GPT-like architecture [30]. It consists of a series of identical residual layers, each comprised of a self-attention operation followed by a positionwise MLP.


Multimodal Few-Shot Learning with Frozen Language Models Jacob Menick

Neural Information Processing Systems

When trained at sufficient scale, auto-regressive language models exhibit the notable ability to learn a new language task after being prompted with just a few examples. Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language). Using aligned image and caption data, we train a vision encoder to represent each image as a sequence of continuous embeddings, such that a pre-trained, frozen language model prompted with this prefix generates the appropriate caption. The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples, represented as a sequence of multiple interleaved image and text embeddings. We demonstrate that it can rapidly learn words for new objects and novel visual categories, do visual question-answering with only a handful of examples, and make use of outside knowledge, by measuring a single model on a variety of established and new benchmarks.


A Bayesian Framework for Cross-Situational Word-Learning

Neural Information Processing Systems

For infants, early word learning is a chicken-and-egg problem. One way to learn a word is to observe that it co-occurs with a particular referent across different situations. Another way is to use the social context of an utterance to infer the in- tended referent of a word. Here we present a Bayesian model of cross-situational word learning, and an extension of this model that also learns which social cues are relevant to determining reference. We test our model on a small corpus of mother-infant interaction and find it performs better than competing models. Fi- nally, we show that our model accounts for experimental phenomena including mutual exclusivity, fast-mapping, and generalization from social cues.


Theory-Based Causal Inference

Tenenbaum, Joshua B., Griffiths, Thomas L.

Neural Information Processing Systems

People routinely make sophisticated causal inferences unconsciously, effortlessly, andfrom very little data - often from just one or a few observations. Weargue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories.


Theory-Based Causal Inference

Tenenbaum, Joshua B., Griffiths, Thomas L.

Neural Information Processing Systems

People routinely make sophisticated causal inferences unconsciously, effortlessly, and from very little data - often from just one or a few observations. We argue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories.


Theory-Based Causal Inference

Tenenbaum, Joshua B., Griffiths, Thomas L.

Neural Information Processing Systems

People routinely make sophisticated causal inferences unconsciously, effortlessly, and from very little data - often from just one or a few observations. We argue that these inferences can be explained as Bayesian computations over a hypothesis space of causal graphical models, shaped by strong top-down prior knowledge in the form of intuitive theories.