Goto

Collaborating Authors

 Shpitser, Ilya


Entropic Inequality Constraints from $e$-separation Relations in Directed Acyclic Graphs with Hidden Variables

arXiv.org Machine Learning

Directed acyclic graphs (DAGs) with hidden variables are often used to characterize causal relations between variables in a system. When some variables are unobserved, DAGs imply a notoriously complicated set of constraints on the distribution of observed variables. In this work, we present entropic inequality constraints that are implied by $e$-separation relations in hidden variable DAGs with discrete observed variables. The constraints can intuitively be understood to follow from the fact that the capacity of variables along a causal pathway to convey information is restricted by their entropy; e.g. at the extreme case, a variable with entropy $0$ can convey no information. We show how these constraints can be used to learn about the true causal model from an observed data distribution. In addition, we propose a measure of causal influence called the minimal mediary entropy, and demonstrate that it can augment traditional measures such as the average causal effect.


Multiply Robust Causal Mediation Analysis with Continuous Treatments

arXiv.org Machine Learning

In many applications, researchers are interested in the direct and indirect causal effects of an intervention on an outcome of interest. Mediation analysis offers a rigorous framework for the identification and estimation of such causal quantities. In the case of binary treatment, efficient estimators for the direct and indirect effects are derived by Tchetgen Tchetgen and Shpitser (2012). These estimators are based on influence functions and possess desirable multiple robustness properties. However, they are not readily applicable when treatments are continuous, which is the case in several settings, such as drug dosage in medical applications. In this work, we extend the influence function-based estimator of Tchetgen Tchetgen and Shpitser (2012) to deal with continuous treatments by utilizing a kernel smoothing approach. We first demonstrate that our proposed estimator preserves the multiple robustness property of the estimator in Tchetgen Tchetgen and Shpitser (2012). Then we show that under certain mild regularity conditions, our estimator is asymptotically normal. Our estimation scheme allows for high-dimensional nuisance parameters that can be estimated at slower rates than the target parameter. Additionally, we utilize cross-fitting, which allows for weaker smoothness requirements for the nuisance functions.


Minimax Kernel Machine Learning for a Class of Doubly Robust Functionals

arXiv.org Machine Learning

A moment function is called doubly robust if it is comprised of two nuisance functions and the estimator based on it is a consistent estimator of the target parameter even if one of the nuisance functions is misspecified. In this paper, we consider a class of doubly robust moment functions originally introduced in (Robins et al., 2008). We demonstrate that this moment function can be used to construct estimating equations for the nuisance functions. The main idea is to choose each nuisance function such that it minimizes the dependency of the expected value of the moment function to the other nuisance function. We implement this idea as a minimax optimization problem. We then provide conditions required for asymptotic linearity of the estimator of the parameter of interest, which are based on the convergence rate of the product of the errors of the nuisance functions, as well as the local ill-posedness of a conditional expectation operator. The convergence rates of the nuisance functions are analyzed using the modern techniques in statistical learning theory based on the Rademacher complexity of the function spaces. We specifically focus on the case that the function spaces are reproducing kernel Hilbert spaces, which enables us to use its spectral properties to analyze the convergence rates. As an application of the proposed methodology, we consider the parameter of average causal effect both in presence and absence of latent confounders. For the case of presence of latent confounders, we use the recently proposed proximal causal inference framework of (Miao et al., 2018; Tchetgen Tchetgen et al., 2020), and hence our results lead to a robust non-parametric estimator for average causal effect in this framework.


Partial Identifiability in Discrete Data With Measurement Error

arXiv.org Machine Learning

When data contains measurement errors, it is necessary to make assumptions relating the observed, erroneous data to the unobserved true phenomena of interest. These assumptions should be justifiable on substantive grounds, but are often motivated by mathematical convenience, for the sake of exactly identifying the target of inference. We adopt the view that it is preferable to present bounds under justifiable assumptions than to pursue exact identification under dubious ones. To that end, we demonstrate how a broad class of modeling assumptions involving discrete variables, including common measurement error and conditional independence assumptions, can be expressed as linear constraints on the parameters of the model. We then use linear programming techniques to produce sharp bounds for factual and counterfactual distributions under measurement error in such models. We additionally propose a procedure for obtaining outer bounds on non-linear models. Our method yields sharp bounds in a number of important settings -- such as the instrumental variable scenario with measurement error -- for which no bounds were previously known.


Differentiable Causal Discovery Under Unmeasured Confounding

arXiv.org Machine Learning

The data drawn from biological, economic, and social systems are often confounded due to the presence of unmeasured variables. Prior work in causal discovery has focused on discrete search procedures for selecting acyclic directed mixed graphs (ADMGs), specifically ancestral ADMGs, that encode ordinary conditional independence constraints among the observed variables of the system. However, confounded systems also exhibit more general equality restrictions that cannot be represented via these graphs, placing a limit on the kinds of structures that can be learned using ancestral ADMGs. In this work, we derive differentiable algebraic constraints that fully characterize the space of ancestral ADMGs, as well as more general classes of ADMGs, arid ADMGs and bow-free ADMGs, that capture all equality restrictions on the observed variables. We use these constraints to cast causal discovery as a continuous optimization problem and design differentiable procedures to find the best fitting ADMG when the data comes from a confounded linear system of equations with correlated errors. We demonstrate the efficacy of our method through simulations and application to a protein expression dataset.


Deriving Bounds and Inequality Constraints Using LogicalRelations Among Counterfactuals

arXiv.org Machine Learning

Causal parameters may not be point identified in the presence of unobserved confounding. However, information about non-identified parameters, in the form of bounds, may still be recovered from the observed data in some cases. We develop a new general method for obtaining bounds on causal parameters using rules of probability and restrictions on counterfactuals implied by causal graphical models. We additionally provide inequality constraints on functionals of the observed data law implied by such causal models. Our approach is motivated by the observation that logical relations between identified and non-identified counterfactual events often yield information about non-identified events. We show that this approach is powerful enough to recover known sharp bounds and tight inequality constraints, and to derive novel bounds and constraints.


A Semiparametric Approach to Interpretable Machine Learning

arXiv.org Machine Learning

Black-box models in machine learning have demonstrated excellent predictive performance in complex problems and high-dimensional settings. However, their lack of transparency and interpretability restrict the applicability of such models in critical decision-making processes. In order to combat this shortcoming, we propose a novel approach to trading off interpretability and performance in prediction models using ideas from semiparametric statistics, allowing us to combine the interpretability of parametric regression models with performance of nonparametric methods. We achieve this by utilizing a two-piece model: the first piece is interpretable and parametric, to which a second, uninterpretable residual piece is added. The performance of the overall model is optimized using methods from the sufficient dimension reduction literature. Influence function based estimators are derived and shown to be doubly robust. This allows for use of approaches such as Double Machine Learning in estimating our model parameters. We illustrate the utility of our approach via simulation studies and a data application based on predicting the length of stay in the intensive care unit among surgery patients.


Explaining The Behavior Of Black-Box Prediction Algorithms With Causal Learning

arXiv.org Artificial Intelligence

We propose to explain the behavior of black-box prediction methods (e.g., deep neural networks trained on image pixel data) using causal graphical models. Specifically, we explore learning the structure of a causal graph where the nodes represent prediction outcomes along with a set of macro-level "interpretable" features, while allowing for arbitrary unmeasured confounding among these variables. The resulting graph may indicate which of the interpretable features, if any, are possible causes of the prediction outcome and which may be merely associated with prediction outcomes due to confounding. The approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors which are "difference-makers" in an interventionist sense. The resulting analysis may be useful in algorithm auditing and evaluation, by identifying features which make a causal difference to the algorithm's output.


Comment on "Blessings of Multiple Causes"

arXiv.org Machine Learning

This scenario is dir ectly analogous to longitudinal causal inference problems with multiple time-varying treatments that conta in time-varying confounders, variables that serve as confounders for some treatments and as mediators for othe r treatments. If there is an unmeasured con-founder for the R -Y relationship (represented by V and the dashed arrows in Figure 1 (a)), then conditioning on R fails to identify the direct effects of A on Y, because it opens a confounding pathway through V . See Hernan and Robins (2020) for an overview of these issues. The answer to the question posed in Appendix B of WB, "Can the c auses be causally dependent among themselves?" is therefore "no." If they are causally depend ent then the deconfounder, by dint of rendering the causes independent, breaks some of the structure among t he causes A, and as was originally established in the time-varying treatment setting, this undermines the identification of joint effects of A on Y by covariate adjustment. Analysis of Lemma 4. This simple argument also serves as a counterexample to Lemm a 4, which states that the deconfounder does not pick up any post-treatment va riables and can be treated as a pre-treatment covariate. This is necessarily false whenever the causes ar e causally dependent among themselves, but it need not hold even if the causes are not causally dependent, s ee below. The proof of Lemma 4 in Appendix I states that "Inferring the s ubstitute confounder Z


Optimal Training of Fair Predictive Models

arXiv.org Machine Learning

Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance, without requiring parametric models for high-dimensional feature vectors.