CausalQuest: Collecting Natural Causal Questions for AI Agents
Ceraolo, Roberto, Kharlapenko, Dmitrii, Reymond, Amélie, Mihalcea, Rada, Sachan, Mrinmaya, Schölkopf, Bernhard, Jin, Zhijing
Humans have an innate drive to seek out causality. Whether fuelled by curiosity or specific goals, we constantly question why things happen, how they are interconnected, and many other related phenomena. To develop AI agents capable of addressing this natural human quest for causality, we urgently need a comprehensive dataset of natural causal questions. Unfortunately, existing datasets either contain only artificially-crafted questions that do not reflect real AI usage scenarios or have limited coverage of questions from specific sources. To address this gap, we present CausalQuest, a dataset of 13,500 naturally occurring questions sourced from social networks, search engines, and AI assistants. We formalize the definition of causal questions and establish a taxonomy for finer-grained classification. Through a combined effort of human annotators and large language models (LLMs), we carefully label the dataset. We find that 42% of the questions humans ask are indeed causal, with the majority seeking to understand the causes behind given effects. Using this dataset, we train efficient classifiers (up to 2.85B parameters) for the binary task of identifying causal questions, achieving high performance with F1 scores of up to 0.877. We conclude with a rich set of future research directions that can build upon our data and models.
May-30-2024
- Country:
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- North America
- United States
- New York (0.04)
- Michigan (0.04)
- Maryland > Montgomery County
- Gaithersburg (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- California > Santa Clara County
- Stanford (0.04)
- Canada
- Quebec > Montreal (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Alberta > Census Division No. 6
- Calgary Metropolitan Region > Calgary (0.04)
- United States
- Europe
- Switzerland > Zürich
- Zürich (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Greater Manchester > Manchester (0.04)
- Italy > Liguria
- Genoa (0.04)
- Greece > Attica
- Athens (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Middle East
- Cyprus (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Germany > Baden-Württemberg
- Tübingen Region > Tübingen (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Switzerland > Zürich
- Asia
- Singapore (0.04)
- Indonesia > Bali (0.04)
- China > Hong Kong (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- India > Karnataka
- Bengaluru (0.04)
- South America > Colombia
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine > Therapeutic Area (1.00)
- Government (1.00)
- Education (1.00)
- Banking & Finance > Real Estate (0.68)
- Information Technology > Services (0.65)
- Technology: