CLadder: Assessing Causal Reasoning in Language Models
Jin, Zhijing, Chen, Yuen, Leeb, Felix, Gresele, Luigi, Kamal, Ojasv, Lyu, Zhiheng, Blin, Kevin, Adauto, Fernando Gonzalez, Kleiman-Weiner, Max, Sachan, Mrinmaya, Schölkopf, Bernhard
–arXiv.org Artificial Intelligence
The ability to perform causal reasoning is widely considered a core feature of intelligence. In this work, we investigate whether large language models (LLMs) can coherently reason about causality. Much of the existing work in natural language processing (NLP) focuses on evaluating commonsense causal reasoning in LLMs, thus failing to assess whether a model can perform causal inference in accordance with a set of well-defined formal rules. To address this, we propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al. We compose a large dataset, CLadder, with 10K samples: based on a collection of causal graphs and queries (associational, interventional, and counterfactual), we obtain symbolic questions and ground-truth answers, through an oracle causal inference engine. These are then translated into natural language. We evaluate multiple LLMs on our dataset, and we introduce and evaluate a bespoke chain-of-thought prompting strategy, CausalCoT. We show that our task is highly challenging for LLMs, and we conduct an in-depth analysis to gain deeper insights into the causal reasoning abilities of LLMs. Our data is open-sourced at https://huggingface.co/datasets/causalNLP/cladder, and our code can be found at https://github.com/causalNLP/cladder.
arXiv.org Artificial Intelligence
Jan-17-2024
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Middle East > Morocco (0.04)
- Ethiopia > Addis Ababa
- Asia
- China > Hong Kong (0.04)
- India > West Bengal
- Kharagpur (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Europe
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Germany
- Baden-Württemberg > Tübingen Region
- Tübingen (0.04)
- Berlin (0.04)
- Baden-Württemberg > Tübingen Region
- Spain
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Valencian Community > Valencia Province
- Valencia (0.04)
- Catalonia > Barcelona Province
- Sweden
- Uppsala County > Uppsala (0.04)
- Vaestra Goetaland > Gothenburg (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom
- England > Cambridgeshire
- Cambridge (0.04)
- Scotland > City of Edinburgh
- Edinburgh (0.04)
- England > Cambridgeshire
- France > Provence-Alpes-Côte d'Azur
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Greenland (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California > San Diego County
- Canada
- Oceania > Australia
- South America > Chile
- Africa
- Genre:
- Overview (0.68)
- Research Report (0.81)
- Workflow (0.94)
- Industry:
- Education (0.67)
- Health & Medicine > Therapeutic Area
- Immunology (0.93)
- Infections and Infectious Diseases (0.92)
- Technology: