Goto

Collaborating Authors

 Model-Based Reasoning


CLadder: Assessing Causal Reasoning in Language Models

Neural Information Processing Systems

The ability to perform causal reasoning is widely considered a core feature of intelligence. In this work, we investigate whether large language models (LLMs) can coherently reason about causality. Much of the existing work in natural language processing (NLP) focuses on evaluating commonsense causal reasoning in LLMs, thus failing to assess whether a model can perform causal inference in accordance with a set of well-defined formal rules. To address this, we propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al. We compose a large dataset, CLadder, with 10K samples: based on a collection of causal graphs and queries (associational, interventional, and counterfactual), we obtain symbolic questions and ground-truth answers, through an oracle causal inference engine.


Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

Neural Information Processing Systems

As a pivotal component to attaining generalizable solutions in human intelligence, reasoning provides great potential for reinforcement learning (RL) agents' generalization towards varied goals by summarizing part-to-whole arguments and discovering cause-and-effect relations. However, how to discover and represent causalities remains a huge gap that hinders the development of causal RL. In this paper, we augment Goal-Conditioned RL (GCRL) with Causal Graph (CG), a structure built upon the relation between objects and events. We novelly formulate the GCRL problem into variational likelihood maximization with CG as latent variables. To optimize the derived objective, we propose a framework with theoretical performance guarantees that alternates between two steps: using interventional data to estimate the posterior of CG; using CG to learn generalizable models and interpretable policies.


PSD Representations for Effective Probability Models

Neural Information Processing Systems

Finding a good way to model probability densities is key to probabilistic inference. An ideal model should be able to concisely approximate any probability while being also compatible with two main operations: multiplications of two models (product rule) and marginalization with respect to a subset of the random variables (sum rule). In this work, we show that a recently proposed class of positive semi-definite (PSD) models for non-negative functions is particularly suited to this end. In particular, we characterize both approximation and generalization capabilities of PSD models, showing that they enjoy strong theoretical guarantees. Moreover, we show that we can perform efficiently both sum and product rule in closed form via matrix operations, enjoying the same versatility of mixture models.


Refined Mechanism Design for Approximately Structured Priors via Active Regression

Neural Information Processing Systems

We consider the problem of a revenue-maximizing seller with a large number of items m for sale to n strategic bidders, whose valuations are drawn independently from high-dimensional, unknown prior distributions. It is well-known that optimal and even approximately-optimal mechanisms for this setting are notoriously difficult to characterize or compute, and, even when they can be found, are often rife with various counter-intuitive properties. In this paper, following a model introduced recently by Cai and Daskalakis [CD22], we consider the case that bidders' prior distributions can be well-approximated by a topic model. We design an active learning component, responsible for interacting with the bidders and outputting low-dimensional approximations of their types, and a mechanism design component, responsible for robustifying mechanisms for the low-dimensional model to work for the approximate types of the former component. On the active learning front, we cast our problem in the framework of Randomized Linear Algebra (RLA) for regression problems, allowing us to import several breakthrough results from that line of research, and adapt them to our setting.


Temporal Causal Reasoning with (Non-Recursive) Structural Equation Models

arXiv.org Artificial Intelligence

Structural Equation Models (SEM) are the standard approach to representing causal dependencies between variables in causal models. In this paper we propose a new interpretation of SEMs when reasoning about Actual Causality, in which SEMs are viewed as mechanisms transforming the dynamics of exogenous variables into the dynamics of endogenous variables. This allows us to combine counterfactual causal reasoning with existing temporal logic formalisms, and to introduce a temporal logic, CPLTL, for causal reasoning about such structures. We show that the standard restriction to so-called \textit{recursive} models (with no cycles in the dependency graph) is not necessary in our approach, allowing us to reason about mutually dependent processes and feedback loops. Finally, we introduce new notions of model equivalence for temporal causal models, and show that CPLTL has an efficient model-checking procedure.


NOMTO: Neural Operator-based symbolic Model approximaTion and discOvery

arXiv.org Artificial Intelligence

While many physical and engineering processes are most effectively described by non-linear symbolic models, existing non-linear symbolic regression (SR) methods are restricted to a limited set of continuous algebraic functions, thereby limiting their applicability to discover higher order non-linear differential relations. In this work, we introduce the Neural Operator-based symbolic Model approximaTion and discOvery (NOMTO) method, a novel approach to symbolic model discovery that leverages Neural Operators to encompass a broad range of symbolic operations. We demonstrate that NOMTO can successfully identify symbolic expressions containing elementary functions with singularities, special functions, and derivatives. Additionally, our experiments demonstrate that NOMTO can accurately rediscover second-order non-linear partial differential equations. It provides a powerful and flexible tool for model discovery, capable of capturing complex relations in a variety of physical systems. Many physical and engineering processes are most effectively described by concise mathematical expressions derived through meticulous observation and analysis. The accuracy of these models is highly dependent on the quality and quantity of available data. With the emergence of large-scale datasets across diverse physical and engineering domains, deriving compact mathematical models in the form of symbolic expressions has become increasingly attainable. This methodology, known as symbolic regression (SR), aims to identify mathematical expressions that most accurately represent given datasets. SR has become indispensable in fields such as physics, biology, and engineering, where it advances knowledge and fosters innovation by uncovering underlying principles and facilitating the development of interpretable predictive models. In recent years, deep learning-based approaches have significantly advanced the field of SR by leveraging neural networks to identify mathematical expressions directly from data.


Cause

arXiv.org Artificial Intelligence

Causal Learning has emerged as a major theme of AI in recent years, promising to use special techniques to reveal the true nature of cause and effect in a number of important domains. We consider the Epistemology of learning and recognizing true cause and effect phenomena. Through thought exercises on the customary use of the word ''cause'', especially in scientific domains, we investigate what, in practice, constitutes a valid causal claim. We recognize the word's uses across scientific domains in disparate form but consistent function within the scientific paradigm. We highlight fundamental distinctions of practice that can be performed in the natural and social sciences, highlight the importance of many systems of interest being open and irreducible and identify the important notion of Hermeneutic knowledge for social science inquiry. We posit that the distinct properties require that definitive causal claims can only come through an agglomeration of consistent evidence across multiple domains and levels of abstraction, such as empirical, physiological, biochemical, etc. We present Cognitive Science as an exemplary multi-disciplinary field providing omnipresent opportunity for such a Research Program, and highlight the main general modes of practice of scientific inquiry that can adequately merge, rather than place as incorrigibly conflictual, multi-domain multi-abstraction scientific practices and language games.


Recommendations for Comprehensive and Independent Evaluation of Machine Learning-Based Earth System Models

arXiv.org Artificial Intelligence

Machine learning (ML) is a revolutionary technology with demonstrable applications across multiple disciplines. Within the Earth science community, ML has been most visible for weather forecasting, producing forecasts that rival modern physics-based models. Given the importance of deepening our understanding and improving predictions of the Earth system on all time scales, efforts are now underway to develop forecasting models into Earth-system models (ESMs), capable of representing all components of the coupled Earth system (or their aggregated behavior) and their response to external changes. Modeling the Earth system is a much more difficult problem than weather forecasting, not least because the model must represent the alternate (e.g., future) coupled states of the system for which there are no historical observations. Given that the physical principles that enable predictions about the response of the Earth system are often not explicitly coded in these ML-based models, demonstrating the credibility of ML-based ESMs thus requires us to build evidence of their consistency with the physical system. To this end, this paper puts forward five recommendations to enhance comprehensive, standardized, and independent evaluation of ML-based ESMs to strengthen their credibility and promote their wider use.


Digital Twin Calibration with Model-Based Reinforcement Learning

arXiv.org Artificial Intelligence

This study is motivated by optimal control applications that exhibit high complexity, high uncertainty, and very limited data [Wang et al., 2024, Zheng et al., 2023, Plotkin et al., 2017, Mirasol, 2017]. In particular, all of these challenges are present in the domain of biopharmaceutical manufacturing, used for production of essential life-saving treatments for severe and chronic diseases, including cancers, autoimmune disorders, metabolic diseases, genetic disorders, and infectious diseases such as COVID-19 [Zahavi and Weiner, 2020, Teo, 2022]. Using cells as factories, biomanufacturing involves hundreds of biological, physical, and chemical factors dynamically interacting with each other at molecular, cellular, and macroscopic levels and impacting production outcomes. Due to the complexity of these mechanisms, it is quite difficult to control production safely and effectively, especially in the presence of very limited data. Digital twins have proven very useful in guiding the control of complex physical systems [Tao et al., 2018].


On The Causal Network Of Face-selective Regions In Human Brain During Movie Watching

arXiv.org Artificial Intelligence

Understanding the causal interactions in simple brain tasks, such as face detection, remains a challenging and ambiguous process for researchers. In this study, we address this issue by employing a novel causal discovery method -- Directed Acyclic Graphs via M-matrices for Acyclicity (DAGMA) -- to investigate the causal structure of the brain's face-selective network and gain deeper insights into its mechanism. Using natural movie stimuli, we extract causal network of face-selective regions and analyze how frames containing faces influence this network. Our findings reveal that the presence of faces in the stimuli have causal effect both on the number and strength of causal connections within the network. Additionally, our results highlight the crucial role of subcortical regions in satisfying causal sufficiency, emphasizing its importance in causal studies of brain. This study provides a new perspective on understanding the causal architecture of the face-selective network of the brain, motivating further research on neural causality.