Casolo, Cecilia
An Asymmetric Independence Model for Causal Discovery on Path Spaces
Manten, Georg, Casolo, Cecilia, Mogensen, Søren Wengel, Kilbertus, Niki
We develop the theory linking 'E-separation' in directed mixed graphs (DMGs) with conditional independence relations among coordinate processes in stochastic differential equations (SDEs), where causal relationships are determined by "which variables enter the governing equation of which other variables". We prove a global Markov property for cyclic SDEs, which naturally extends to partially observed cyclic SDEs, because our asymmetric independence model is closed under marginalization. We then characterize the class of graphs that encode the same set of independence relations, yielding a result analogous to the seminal 'same skeleton and v-structures' result for directed acyclic graphs (DAGs). In the fully observed case, we show that each such equivalence class of graphs has a greatest element as a parsimonious representation and develop algorithms to identify this greatest element from data. We conjecture that a greatest element also exists under partial observations, which we verify computationally for graphs with up to four nodes.
Your Assumed DAG is Wrong and Here's How To Deal With It
Padh, Kirtan, Li, Zhufeng, Casolo, Cecilia, Kilbertus, Niki
Assuming a directed acyclic graph (DAG) that represents prior knowledge of causal relationships between variables is a common starting point for cause-effect estimation. Existing literature typically invokes hypothetical domain expert knowledge or causal discovery algorithms to justify this assumption. In practice, neither may propose a single DAG with high confidence. Domain experts are hesitant to rule out dependencies with certainty or have ongoing disputes about relationships; causal discovery often relies on untestable assumptions itself or only provides an equivalence class of DAGs and is commonly sensitive to hyperparameter and threshold choices. We propose an efficient, gradient-based optimization method that provides bounds for causal queries over a collection of causal graphs -- compatible with imperfect prior knowledge -- that may still be too large for exhaustive enumeration. Our bounds achieve good coverage and sharpness for causal queries such as average treatment effects in linear and non-linear synthetic settings as well as on real-world data. Our approach aims at providing an easy-to-use and widely applicable rebuttal to the valid critique of `What if your assumed DAG is wrong?'.
Uncertainty-Aware Optimal Treatment Selection for Clinical Time Series
Schwarz, Thomas, Casolo, Cecilia, Kilbertus, Niki
In personalized medicine, the ability to predict and optimize treatment outcomes across various time frames is essential. Additionally, the ability to select cost-effective treatments within specific budget constraints is critical. Despite recent advancements in estimating counterfactual trajectories, a direct link to optimal treatment selection based on these estimates is missing. This paper introduces a novel method integrating counterfactual estimation techniques and uncertainty quantification to recommend personalized treatment plans adhering to predefined cost constraints. Our approach is distinctive in its handling of continuous treatment variables and its incorporation of uncertainty quantification to improve prediction reliability. We validate our method using two simulated datasets, one focused on the cardiovascular system and the other on COVID-19. Our findings indicate that our method has robust performance across different counterfactual estimation baselines, showing that introducing uncertainty quantification in these settings helps the current baselines in finding more reliable and accurate treatment selection. The robustness of our method across various settings highlights its potential for broad applicability in personalized healthcare solutions.
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes
Manten, Georg, Casolo, Cecilia, Ferrucci, Emilio, Mogensen, Søren Wengel, Salvi, Cristopher, Kilbertus, Niki
Inferring the causal structure underlying stochastic dynamical systems from observational data holds great promise in domains ranging from science and health to finance. Such processes can often be accurately modeled via stochastic differential equations (SDEs), which naturally imply causal relationships via "which variables enter the differential of which other variables". In this paper, we develop a kernel-based test of conditional independence (CI) on "path-space" -- solutions to SDEs -- by leveraging recent advances in signature kernels. We demonstrate strictly superior performance of our proposed CI test compared to existing approaches on path-space. Then, we develop constraint-based causal discovery algorithms for acyclic stochastic dynamical systems (allowing for loops) that leverage temporal information to recover the entire directed graph. Assuming faithfulness and a CI oracle, our algorithm is sound and complete. We empirically verify that our developed CI test in conjunction with the causal discovery algorithm reliably outperforms baselines across a range of settings.
Learning Counterfactually Invariant Predictors
Quinzan, Francesco, Casolo, Cecilia, Muandet, Krikamol, Luo, Yucen, Kilbertus, Niki
Invariance, or equivariance to certain data transformations, has proven essential in numerous applications of machine learning (ML), since it can lead to better generalization capabilities [Arjovsky et al., 2019, Bloem-Reddy and Teh, 2020, Chen et al., 2020]. For instance, in image recognition, predictions ought to remain unchanged under scaling, translation, or rotation of the input image. Data augmentation, an early heuristic to promote such invariances, has become indispensable for successfully training deep neural networks (DNNs) [Shorten and Khoshgoftaar, 2019, Xie et al., 2020]. Well-known examples of "invariance by design" include convolutional neural networks (CNNs) for translation invariance [Krizhevsky et al., 2012], group equivariant NNs for general group transformations [Cohen and Welling, 2016], recurrent neural networks (RNNs) and transformers for sequential data [Vaswani et al., 2017], DeepSet [Zaheer et al., 2017] for sets, and graph neural networks (GNNs) for different types of geometric structures [Battaglia et al., 2018]. Many applications in modern ML, however, call for arguably stronger notions of invariance based on causality. This case has been made for image classification, algorithmic fairness [Hardt et al., 2016, Mitchell et al., 2021], robustness [Bühlmann, 2020], and out-of-distribution generalization [Lu et al., 2021]. The goal is invariance with respect to hypothetical manipulations of the data generating process (DGP). Various works develop methods that assume observational distributions (across environments or between training and test) to be governed by shared causal mechanisms, but differ due to various types of distribution shifts encoded by the causal model [Arjovsky et al., 2019, Bühlmann, 2020, Heinze-Deml et al., 2018, Makar et al., 2022, Part of this work was done while Francesco Quinzan visited the Max Planck Institute for Intelligent Systems, Tübingen, Germany.