Feuerriegel, Stefan
Differentially Private Learners for Heterogeneous Treatment Effects
Schröder, Maresa, Melnychuk, Valentyn, Feuerriegel, Stefan
Patient data is widely used to estimate heterogeneous treatment effects and thus understand the effectiveness and safety of drugs. Yet, patient data includes highly sensitive information that must be kept private. In this work, we aim to estimate the conditional average treatment effect (CATE) from observational data under differential privacy. Specifically, we present DP-CATE, a novel framework for CATE estimation that is Neyman-orthogonal and further ensures differential privacy of the estimates. Our framework is highly general: it applies to any two-stage CATE meta-learner with a Neyman-orthogonal loss function, and any machine learning model can be used for nuisance estimation. We further provide an extension of our DP-CATE, where we employ RKHS regression to release the complete CATE function while ensuring differential privacy. We demonstrate our DP-CATE across various experiments using synthetic and real-world datasets. To the best of our knowledge, we are the first to provide a framework for CATE estimation that is Neyman-orthogonal and differentially private.
Efficient and Sharp Off-Policy Learning under Unobserved Confounding
Hess, Konstantin, Frauen, Dennis, Melnychuk, Valentyn, Feuerriegel, Stefan
We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assumes unconfoundedness, meaning that no unobserved factors influence both treatment assignment and outcomes. However, this assumption is often violated, because of which standard policy learning produces biased estimates and thus leads to policies that can be harmful. To address this limitation, we employ causal sensitivity analysis and derive a statistically efficient estimator for a sharp bound on the value function under unobserved confounding. Our estimator has three advantages: (1) Unlike existing works, our estimator avoids unstable minimax optimization based on inverse propensity weighted outcomes. (2) Our estimator is statistically efficient. (3) We prove that our estimator leads to the optimal confounding-robust policy. Finally, we extend our theory to the related task of policy improvement under unobserved confounding, i.e., when a baseline policy such as the standard of care is available. We show in experiments with synthetic and real-world data that our method outperforms simple plug-in approaches and existing baselines. Our method is highly relevant for decision-making where unobserved confounding can be problematic, such as in healthcare and public policy.
Orthogonal Representation Learning for Estimating Causal Quantities
Melnychuk, Valentyn, Frauen, Dennis, Schweisthal, Jonas, Feuerriegel, Stefan
Representation learning is widely used for estimating causal quantities (e.g., the conditional average treatment effect) from observational data. While existing representation learning methods have the benefit of allowing for end-to-end learning, they do not have favorable theoretical properties of Neyman-orthogonal learners, such as double robustness and quasi-oracle efficiency. Also, such representation learning methods often employ additional constraints, like balancing, which may even lead to inconsistent estimation. In this paper, we propose a novel class of Neyman-orthogonal learners for causal quantities defined at the representation level, which we call OR-learners. Our OR-learners have several practical advantages: they allow for consistent estimation of causal quantities based on any learned representation, while offering favorable theoretical properties including double robustness and quasi-oracle efficiency. In multiple experiments, we show that, under certain regularity conditions, our OR-learners improve existing representation learning methods and achieve state-of-the-art performance. To the best of our knowledge, our OR-learners are the first work to offer a unified framework of representation learning methods and Neyman-orthogonal learners for causal quantities estimation.
Quantifying Aleatoric Uncertainty of the Treatment Effect: A Novel Orthogonal Learner
Melnychuk, Valentyn, Feuerriegel, Stefan, van der Schaar, Mihaela
Estimating causal quantities from observational data is crucial for understanding the safety and effectiveness of medical treatments. However, to make reliable inferences, medical practitioners require not only estimating averaged causal quantities, such as the conditional average treatment effect, but also understanding the randomness of the treatment effect as a random variable. This randomness is referred to as aleatoric uncertainty and is necessary for understanding the probability of benefit from treatment or quantiles of the treatment effect. Yet, the aleatoric uncertainty of the treatment effect has received surprisingly little attention in the causal machine learning community. To fill this gap, we aim to quantify the aleatoric uncertainty of the treatment effect at the covariate-conditional level, namely, the conditional distribution of the treatment effect (CDTE). Unlike average causal quantities, the CDTE is not point identifiable without strong additional assumptions. As a remedy, we employ partial identification to obtain sharp bounds on the CDTE and thereby quantify the aleatoric uncertainty of the treatment effect. We then develop a novel, orthogonal learner for the bounds on the CDTE, which we call AU-learner. We further show that our AU-learner has several strengths in that it satisfies Neyman-orthogonality and, thus, quasi-oracle efficiency. Finally, we propose a fully-parametric deep learning instantiation of our AU-learner.
Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets
Wang, Yuxin, Schröder, Maresa, Frauen, Dennis, Schweisthal, Jonas, Hess, Konstantin, Feuerriegel, Stefan
Constructing confidence intervals (CIs) for the average treatment effect (ATE) from patient records is crucial to assess the effectiveness and safety of drugs. However, patient records typically come from different hospitals, thus raising the question of how multiple observational datasets can be effectively combined for this purpose. In our paper, we propose a new method that estimates the ATE from multiple observational datasets and provides valid CIs. Our method makes little assumptions about the observational datasets and is thus widely applicable in medical practice. The key idea of our method is that we leverage predictionpowered inferences and thereby essentially'shrink' the CIs so that we offer more precise uncertainty quantification as compared to naïve approaches. We further prove the unbiasedness of our method and the validity of our CIs. We confirm our theoretical results through various numerical experiments. Finally, we provide an extension of our method for constructing CIs from combinations of experimental and observational datasets. Estimating the average treatment effect (ATE) together with confidence intervals (CIs) is relevant in many fields, such as medicine, where the ATE is used to assess the effectiveness and safety of drugs (Glass et al., 2013; Feuerriegel et al., 2024). Nowadays, there is a growing interest in using observational datasets for this purpose, for example, electronic health records (EHRs) and clinical registries (Johnson et al., 2016; Corrigan-Curay et al., 2018; Hong, 2021). Importantly, such observational datasets typically originate from different hospitals, different health providers, or even different countries (Colnet et al., 2024), thus raising the question of how to construct CIs for ATE estimation from multiple observational datasets. Motivating example: During the COVID-19 pandemic, the effectiveness and safety of potential drugs and vaccines were often assessed from electronic health records that originated from different hospitals to rapidly generate new evidence with treatment guidelines (Tacconelli et al., 2022). For example, one study (Wong et al., 2024) estimated the effect of nirmatrelvir/ritonavir (also known under the commercial name "paxlovid") in patients with COVID-19 diagnosis on 28-day all-cause hospitalizations from data obtained through a retrospective, multi-center study.
Slowing Down Forgetting in Continual Learning
Janetzky, Pascal, Schlagenhauf, Tobias, Feuerriegel, Stefan
A common challenge in continual learning (CL) is catastrophic forgetting, where the performance on old tasks drops after new, additional tasks are learned. In this paper, we propose a novel framework called ReCL to slow down forgetting in CL. Our framework exploits an implicit bias of gradient-based neural networks due to which these converge to margin maximization points. Such convergence points allow us to reconstruct old data from previous tasks, which we then combine with the current training data. Our framework is flexible and can be applied on top of existing, state-of-the-art CL methods to slow down forgetting. We further demonstrate the performance gain from our framework across a large series of experiments, including different CL scenarios (class incremental, domain incremental, task incremental learning) different datasets (MNIST, CIFAR10), and different network architectures. Across all experiments, we find large performance gains through ReCL. To the best of our knowledge, our framework is the first to address catastrophic forgetting by leveraging models in CL as their own memory buffers.
Learning Representations of Instruments for Partial Identification of Treatment Effects
Schweisthal, Jonas, Frauen, Dennis, Schröder, Maresa, Hess, Konstantin, Kilbertus, Niki, Feuerriegel, Stefan
Reliable estimation of treatment effects from observational data is important in many disciplines such as medicine. However, estimation is challenging when unconfoundedness as a standard assumption in the causal inference literature is violated. In this work, we leverage arbitrary (potentially high-dimensional) instruments to estimate bounds on the conditional average treatment effect (CATE). Our contributions are three-fold: (1) We propose a novel approach for partial identification through a mapping of instruments to a discrete representation space so that we yield valid bounds on the CATE. This is crucial for reliable decision-making in real-world applications. (2) We derive a two-step procedure that learns tight bounds using a tailored neural partitioning of the latent instrument space. As a result, we avoid instability issues due to numerical approximations or adversarial training. Furthermore, our procedure aims to reduce the estimation variance in finite-sample settings to yield more reliable estimates. (3) We show theoretically that our procedure obtains valid bounds while reducing estimation variance. We further perform extensive experiments to demonstrate the effectiveness across various settings. Overall, our procedure offers a novel path for practitioners to make use of potentially high-dimensional instruments (e.g., as in Mendelian randomization).
DiffPO: A causal diffusion model for learning distributions of potential outcomes
Ma, Yuchen, Melnychuk, Valentyn, Schweisthal, Jonas, Feuerriegel, Stefan
Predicting potential outcomes of interventions from observational data is crucial for decision-making in medicine, but the task is challenging due to the fundamental problem of causal inference. Existing methods are largely limited to point estimates of potential outcomes with no uncertain quantification; thus, the full information about the distributions of potential outcomes is typically ignored. In this paper, we propose a novel causal diffusion model called DiffPO, which is carefully designed for reliable inferences in medicine by learning the distribution of potential outcomes. In our DiffPO, we leverage a tailored conditional denoising diffusion model to learn complex distributions, where we address the selection bias through a novel orthogonal diffusion loss. Another strength of our DiffPO method is that it is highly flexible (e.g., it can also be used to estimate different causal quantities such as CATE). Across a wide range of experiments, we show that our method achieves state-of-the-art performance.
Causal machine learning for predicting treatment outcomes
Feuerriegel, Stefan, Frauen, Dennis, Melnychuk, Valentyn, Schweisthal, Jonas, Hess, Konstantin, Curth, Alicia, Bauer, Stefan, Kilbertus, Niki, Kohane, Isaac S., van der Schaar, Mihaela
Causal machine learning (ML) offers flexible, data-driven methods for predicting treatment outcomes. Here, we present how methods from causal ML can be used to understand the effectiveness of treatments, thereby supporting the assessment and safety of drugs. A key benefit of causal ML is that allows for estimating individualized treatment effects, as well as personalized predictions of potential patient outcomes under different treatments. This offers granular insights into when treatments are effective, so that decision-making in patient care can be personalized to individual patient profiles. We further discuss how causal ML can be used in combination with both clinical trial data as well as real-world data such as clinical registries and electronic health records. We finally provide recommendations for the reliable use of causal ML in medicine. First published in Nature Medicine, 30, 958-968 (2024) by Springer Nature. Assessing the effectiveness of treatments is crucial to ensure patient safety and personalize patient care. Recent innovations in machine learning (ML) offer new, data-driven methods to estimate treatment effects from data. This branch in ML is commonly referred to as causal ML as it aims to predict a causal quantity, namely, the patient outcomes due to treatment [1]. Causal ML can be used in order to estimate treatment effects from both experimental data obtained through randomized controlled trials (RCTs) and observational data obtained from clinical registries, electronic health records, and other real-world data (RWD) sources to generate clinical evidence. A key strength of causal ML is that it allows to estimate individualized treatment effects, as well as to make personalized predictions of potential patient outcomes under different treatments.
Stabilized Neural Prediction of Potential Outcomes in Continuous Time
Hess, Konstantin, Feuerriegel, Stefan
Patient trajectories from electronic health records are widely used to predict potential outcomes of treatments over time, which then allows to personalize care. Yet, existing neural methods for this purpose have a key limitation: while some adjust for time-varying confounding, these methods assume that the time series are recorded in discrete time. In other words, they are constrained to settings where measurements and treatments are conducted at fixed time steps, even though this is unrealistic in medical practice. In this work, we aim to predict potential outcomes in continuous time. The latter is of direct practical relevance because it allows for modeling patient trajectories where measurements and treatments take place at arbitrary, irregular timestamps. We thus propose a new method called stabilized continuous time inverse propensity network (SCIP-Net). For this, we further derive stabilized inverse propensity weights for robust prediction of the potential outcomes. To the best of our knowledge, our SCIP-Net is the first neural method that performs proper adjustments for time-varying confounding in continuous time.