Goto

Collaborating Authors

 Bühlmann, Peter


Treatment Effect Estimation with Observational Network Data using Machine Learning

arXiv.org Machine Learning

Classical causal inference approaches for treatment effect estimation with observational data usually assume independent units. This assumption is part of the common stable unit treatment value assumption (SUTV A) (Rubin, 1980). However, independence is often violated in practice due to interactions among units that lead to so-called spillover effects. For example, the vaccination against an infectious disease (treatment) of a person (unit) may not only influence this person's health status (outcome), but may also protect the health status of other people the person is interacting with (Perez-Heydrich et al., 2014; Sävje et al., 2021). In the presence of spillover effects, standard algorithms fail to separate correlation from causation, and spurious associations due to network dependence contribute to the replication crisis (Lee and Ogburn, 2021) and may yield biased causal effect estimators and invalid inference (Sobel, 2006; Perez-Heydrich et al., 2014; Ogburn et al., 2022; 1


Random Forests for Change Point Detection

arXiv.org Machine Learning

We propose a novel multivariate nonparametric multiple change point detection method using classifiers. We construct a classifier log-likelihood ratio that uses class probability predictions to compare different change point configurations. We propose a computationally feasible search method that is particularly well suited for random forests, denoted by changeforest. However, the method can be paired with any classifier that yields class probability predictions, which we illustrate by also using a k-nearest neighbor classifier. We prove that it consistently locates change points in single change point settings when paired with a consistent classifier. Our proposed method changeforest achieves improved empirical performance in an extensive simulation study compared to existing multivariate nonparametric change point detection methods. An efficient implementation of our method is made available for R, Python, and Rust users in the changeforest software package.


Causality-oriented robustness: exploiting general additive interventions

arXiv.org Artificial Intelligence

Since distribution shifts are common in real-world applications, there is a pressing need for developing prediction models that are robust against such shifts. Existing frameworks, such as empirical risk minimization or distributionally robust optimization, either lack generalizability for unseen distributions or rely on postulated distance measures. Alternatively, causality offers a data-driven and structural perspective to robust predictions. However, the assumptions necessary for causal inference can be overly stringent, and the robustness offered by such causal models often lacks flexibility. In this paper, we focus on causality-oriented robustness and propose Distributional Robustness via Invariant Gradients (DRIG), a method that exploits general additive interventions in training data for robust predictions against unseen interventions, and naturally interpolates between in-distribution prediction and causality. In a linear setting, we prove that DRIG yields predictions that are robust among a data-dependent class of distribution shifts. Furthermore, we show that our framework includes anchor regression (Rothenh\"ausler et al.\ 2021) as a special case, and that it yields prediction models that protect against more diverse perturbations. We extend our approach to the semi-supervised domain adaptation setting to further improve prediction performance. Finally, we empirically validate our methods on synthetic simulations and on single-cell data.


On the Identifiability and Estimation of Causal Location-Scale Noise Models

arXiv.org Artificial Intelligence

We study the class of location-scale or heteroscedastic noise models (LSNMs), in which the effect $Y$ can be written as a function of the cause $X$ and a noise source $N$ independent of $X$, which may be scaled by a positive function $g$ over the cause, i.e., $Y = f(X) + g(X)N$. Despite the generality of the model class, we show the causal direction is identifiable up to some pathological cases. To empirically validate these theoretical findings, we propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks. Both model the conditional distribution of $Y$ given $X$ as a Gaussian parameterized by its natural parameters. When the feature maps are correctly specified, we prove that our estimator is jointly concave, and a consistent estimator for the cause-effect identification task. Although the the neural network does not inherit those guarantees, it can fit functions of arbitrary complexity, and reaches state-of-the-art performance across benchmarks.


TSCI: two stage curvature identification for causal inference with invalid instruments

arXiv.org Machine Learning

TSCI implements treatment effect estimation from observational data under invalid instruments in the R statistical computing environment. Existing instrumental variable approaches rely on arguably strong and untestable identification assumptions, which limits their practical application. TSCI does not require the classical instrumental variable identification conditions and is effective even if all instruments are invalid. TSCI implements a two-stage algorithm. In the first stage, machine learning is used to cope with nonlinearities and interactions in the treatment model. In the second stage, a space to capture the instrument violations is selected in a data-adaptive way. These violations are then projected out to estimate the treatment effect.


repliclust: Synthetic Data for Cluster Analysis

arXiv.org Artificial Intelligence

Our approach is based on data set archetypes, high-level geometric descriptions from which the user can create many different data sets, each possessing the desired geometric characteristics. The architecture of our software is modular and object-oriented, decomposing data generation into algorithms for placing cluster centers, sampling cluster shapes, selecting the number of data points for each cluster, and assigning probability distributions to clusters.


Characterization and Greedy Learning of Gaussian Structural Causal Models under Unknown Interventions

arXiv.org Machine Learning

We consider the problem of recovering the causal structure underlying observations from different experimental conditions when the targets of the interventions in each experiment are unknown. We assume a linear structural causal model with additive Gaussian noise and consider interventions that perturb their targets while maintaining the causal relationships in the system. Different models may entail the same distributions, offering competing causal explanations for the given observations. We fully characterize this equivalence class and offer identifiability results, which we use to derive a greedy algorithm called GnIES to recover the equivalence class of the data-generating model without knowledge of the intervention targets. In addition, we develop a novel procedure to generate semi-synthetic data sets with known causal ground truth but distributions closely resembling those of a real data set of choice. We leverage this procedure and evaluate the performance of GnIES on synthetic, real, and semi-synthetic data sets. Despite the strong Gaussian distributional assumption, GnIES is robust to an array of model violations and competitive in recovering the causal structure in small- to large-sample settings. We provide, in the Python packages "gnies" and "sempler", implementations of GnIES and our semi-synthetic data generation procedure.


A Fast Non-parametric Approach for Causal Structure Learning in Polytrees

arXiv.org Machine Learning

We study the problem of causal structure learning with no assumptions on the functional relationships and noise. We develop DAG-FOCI, a computationally fast algorithm for this setting that is based on the FOCI variable selection algorithm in \cite{azadkia2019simple}. DAG-FOCI requires no tuning parameter and outputs the parents and the Markov boundary of a response variable of interest. We provide high-dimensional guarantees of our procedure when the underlying graph is a polytree. Furthermore, we demonstrate the applicability of DAG-FOCI on real data from computational biology \cite{sachs2005causal} and illustrate the robustness of our methods to violations of assumptions.


Double Machine Learning for Partially Linear Mixed-Effects Models with Repeated Measurements

arXiv.org Machine Learning

Traditionally, spline or kernel approaches in combination with parametric estimation are used to infer the linear coefficient (fixed effects) in a partially linear mixed-effects model (PLMM) for repeated measurements. Using machine learning algorithms allows us to incorporate more complex interaction structures and high-dimensional variables. We employ double machine learning to cope with the nonparametric part of the PLMM: the nonlinear variables are regressed out nonparametrically from both the linear variables and the response. This adjustment can be performed with any machine learning algorithm, for instance random forests. The adjusted variables satisfy a linear mixed-effects model, where the linear coefficient can be estimated with standard linear mixed-effects techniques. We prove that the estimated fixed effects coefficient converges at the parametric rate and is asymptotically Gaussian distributed and semiparametrically efficient. Empirical examples demonstrate our proposed algorithm. We present two simulation studies and analyze a dataset with repeated CD4 cell counts from HIV patients. Software code for our method is available in the R-package dmlalg.


Structure Learning for Directed Trees

arXiv.org Machine Learning

Knowing the causal structure of a system is of fundamental interest in many areas of science and can aid the design of prediction algorithms that work well under manipulations to the system. The causal structure becomes identifiable from the observational distribution under certain restrictions. To learn the structure from data, score-based methods evaluate different graphs according to the quality of their fits. However, for large nonlinear models, these rely on heuristic optimization approaches with no general guarantees of recovering the true causal structure. In this paper, we consider structure learning of directed trees. We propose a fast and scalable method based on Chu-Liu-Edmonds' algorithm we call causal additive trees (CAT). For the case of Gaussian errors, we prove consistency in an asymptotic regime with a vanishing identifiability gap. We also introduce a method for testing substructure hypotheses with asymptotic family-wise error rate control that is valid post-selection and in unidentified settings. Furthermore, we study the identifiability gap, which quantifies how much better the true causal model fits the observational distribution, and prove that it is lower bounded by local properties of the causal model. Simulation studies demonstrate the favorable performance of CAT compared to competing structure learning methods.