Goto

Collaborating Authors

 unmeasured confounder




Assessment of the conditional exchangeability assumption in causal machine learning models: a simulation study

Portela, Gerard T., Gibbons, Jason B., Schneeweiss, Sebastian, Desai, Rishi J.

arXiv.org Machine Learning

Observational studies developing causal machine learning (ML) models for the prediction of individualized treatment effects (ITEs) seldom conduct empirical evaluations to assess the conditional exchangeability assumption. We aimed to evaluate the performance of these models under conditional exchangeability violations and the utility of negative control outcomes (NCOs) as a diagnostic. We conducted a simulation study to examine confounding bias in ITE estimates generated by causal forest and X-learner models under varying conditions, including the presence or absence of true heterogeneity. We simulated data to reflect real-world scenarios with differing levels of confounding, sample size, and NCO confounding structures. We then estimated and compared subgroup-level treatment effects on the primary outcome and NCOs across settings with and without unmeasured confounding. When conditional exchangeability was violated, causal forest and X-learner models failed to recover true treatment effect heterogeneity and, in some cases, falsely indicated heterogeneity when there was none. NCOs successfully identified subgroups affected by unmeasured confounding. Even when NCOs did not perfectly satisfy its ideal assumptions, it remained informative, flagging potential bias in subgroup level estimates, though not always pinpointing the subgroup with the largest confounding. Violations of conditional exchangeability substantially limit the validity of ITE estimates from causal ML models in routinely collected observational data. NCOs serve a useful empirical diagnostic tool for detecting subgroup-specific unmeasured confounding and should be incorporated into causal ML workflows to support the credibility of individualized inference.



Partial Functional Dynamic Backdoor Diffusion-based Causal Model

Liu, Xinwen, Qian, Lei, Chen, Song Xi, Tang, Niansheng

arXiv.org Machine Learning

We introduce a Partial Functional Dynamic Backdoor Diffusion-based Causal Model (PFD-BDCM), specifically designed for causal inference in the presence of unmeasured confounders with spatial heterogeneity and temporal dependency. The proposed PFD-BDCM framework addresses the restrictions of the existing approaches by uniquely integrating models for complex spatio-temporal dynamics with the analysis of multi-resolution variables. Specifically, the framework systematically mitigates confounding bias by integrating valid backdoor adjustment sets into a diffusion-based sampling mechanism. Moreover, it accounts for the intricate dynamics of unmeasured confounders through the deployment of region-specific structural equations and conditional autoregressive processes, and accommodates variables observed at heterogeneous resolutions via basis expansions for functional data. Our theoretical analysis establishes error bounds for counterfactual estimates of PFD-BDCM, formally linking reconstruction accuracy to counterfactual fidelity under monotonicity assumptions of structural equation and invertibility assumptions of encoding function. Empirical evaluations on synthetic datasets and real-world air pollution data demonstrate PFD-BDCM's superiority over existing methods.


Differentiable Cyclic Causal Discovery Under Unmeasured Confounders

Sethuraman, Muralikrishnna G., Fekri, Faramarz

arXiv.org Machine Learning

Understanding causal relationships between variables is fundamental across scientific disciplines. Most causal discovery algorithms rely on two key assumptions: (i) all variables are observed, and (ii) the underlying causal graph is acyclic. While these assumptions simplify theoretical analysis, they are often violated in real-world systems, such as biological networks. Existing methods that account for confounders either assume linearity or struggle with scalability. To address these limitations, we propose DCCD-CONF, a novel framework for differentiable learning of nonlinear cyclic causal graphs in the presence of unmeasured confounders using interventional data. Our approach alternates between optimizing the graph structure and estimating the confounder distribution by maximizing the log-likelihood of the data. Through experiments on synthetic data and real-world gene perturbation datasets, we show that DCCD-CONF outperforms state-of-the-art methods in both causal graph recovery and confounder identification. Additionally, we also provide consistency guarantees for our framework, reinforcing its theoretical soundness.


Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions

Park, Soojin, Kang, Suyeon, Lee, Chioun

arXiv.org Machine Learning

Causal decomposition analysis aims to assess the effect of modifying risk factors on reducing social disparities in outcomes. Recently, this analysis has incorporated individual characteristics when modifying risk factors by utilizing optimal treatment regimes (OTRs). Since the newly defined individualized effects rely on the no omitted confounding assumption, developing sensitivity analyses to account for potential omitted confounding is essential. Moreover, OTRs and individualized effects are primarily based on binary risk factors, and no formal approach currently exists to benchmark the strength of omitted confounding using observed covariates for binary risk factors. To address this gap, we extend a simulation-based sensitivity analysis that simulates unmeasured confounders, addressing two sources of bias emerging from deriving OTRs and estimating individualized effects. Additionally, we propose a formal bounding strategy that benchmarks the strength of omitted confounding for binary risk factors. Using the High School Longitudinal Study 2009 (HSLS:09), we demonstrate this sensitivity analysis and benchmarking method.


Sequential Treatment Effect Estimation with Unmeasured Confounders

Wang, Yingrong, Wu, Anpeng, Li, Baohong, Xiao, Ziyang, Xiong, Ruoxuan, Han, Qing, Kuang, Kun

arXiv.org Artificial Intelligence

This paper studies the cumulative causal effects of sequential treatments in the presence of unmeasured confounders. It is a critical issue in sequential decision-making scenarios where treatment decisions and outcomes dynamically evolve over time. Advanced causal methods apply transformer as a backbone to model such time sequences, which shows superiority in capturing long time dependence and periodic patterns via attention mechanism. However, even they control the observed confounding, these estimators still suffer from unmeasured confounders, which influence both treatment assignments and outcomes. How to adjust the latent confounding bias in sequential treatment effect estimation remains an open challenge. Therefore, we propose a novel Decomposing Sequential Instrumental Variable framework for CounterFactual Regression (DSIV-CFR), relying on a common negative control assumption. Specifically, an instrumental variable (IV) is a special negative control exposure, while the previous outcome serves as a negative control outcome. This allows us to recover the IVs latent in observation variables and estimate sequential treatment effects via a generalized moment condition. We conducted experiments on 4 datasets and achieved significant performance in one- and multi-step prediction, supported by which we can identify optimal treatments for dynamic systems.


Proximal Inference on Population Intervention Indirect Effect

Bai, Yang, Cui, Yifan, Sun, Baoluo

arXiv.org Machine Learning

Additionally, experiments have shown that depersonalization symptoms can arise as a reaction to alcohol consumption (Raimo et al., 1999), and they are increasingly recognized as a significant prognostic factor in the course of depression (Michal et al., 2024). Despite these findings, little research has explored the mediating role of depersonalization symptoms in the causal pathway from alcohol consumption to depression. In this paper, we propose a methodological framework to evaluate the indirect effect of alcohol consumption on depression, with depersonalization acting as a mediator. To ground our analysis, we use data from a cross-sectional survey conducted during the COVID-19 pandemic by Dom ınguez-Espinosa et al. (2023) as a running example. In observational studies, the population average causal effect (ACE) and the natural indirect effect (NIE) are the most commonly used measures of total and mediation effects, respectively, to compare the outcomes of different intervention policies. For instance, in our running example, these two measures compare the depression outcomes between individuals engaging in hazardous versus non-hazardous alcohol consumption. However, clinical practice imposes ethical constraints, as healthcare professionals would not prescribe harmful levels of alcohol consumption. As a result, hypothetical interventions involving dangerous exposure levels are unrealistic. To address this situation with potentially harmful exposure, Hubbard and Van der Laan (2008) propose the population intervention effect (PIE), which contrasts outcomes between the natural population and a hypothetical population where no one is exposed to the harmful exposure level.


Falsification of Unconfoundedness by Testing Independence of Causal Mechanisms

Karlsson, Rickard K. A., Krijthe, Jesse H.

arXiv.org Machine Learning

Using observational studies to estimate treatment effects is a ubiquitous yet challenging task in many disciplines, such as medicine [Hernán and Robins, 2006] or social sciences [Athey and Imbens, 2017]. Whereas there exists a rich literature of methods for treatment effect estimation in the observational setting [Bang and Robins, 2005, Wager and Athey, 2018, Chernozhukov et al., 2018], all methods have in common that before a causal effect can be estimated, often untestable conditions need to hold. One such condition is that we assume there is no unmeasured confounding, meaning that there are no unobserved factors that have both an influence on the treatment and on the outcome of interest that are not accounted for by the method. If unmeasured confounders are present, our causal effect estimates are likely to be biased and inconsistent [Greenland et al., 1999]. This can have serious downstream consequences such as unknowingly recommending a non-effective or, even worse, potentially harmful treatment policy. Unfortunately, without making further assumptions, it is in general impossible to verify all assumptions needed to identify treatment effects from observational data. In this work, we investigate a novel strategy for falsifying unconfoundedness. Specifically, we focus on the common scenario where observational datasets are collected from different heterogeneous sources, which we refer to as environments.