causal estimate
A Critical Look at the Consistency of Causal Estimation with Deep Latent Variable Models
Using deep latent variable models in causal inference has attracted considerable interest recently, but an essential open question is their ability to yield consistent causal estimates. While they have demonstrated promising results and theory exists on some simple model formulations, we also know that causal effects are not even identifiable in general with latent variables. We investigate this gap between theory and empirical results with analytical considerations and extensive experiments under multiple synthetic and real-world data sets, using the causal effect variational autoencoder (CEVAE) as a case study. While CEVAE seems to work reliably under some simple scenarios, it does not estimate the causal effect correctly with a misspecified latent variable or a complex data distribution, as opposed to its original motivation. Hence, our results show that more attention should be paid to ensuring the correctness of causal estimates with deep latent variable models.
f5cfbc876972bd0d031c8abc37344c28-AuthorFeedback.pdf
We thank the reviewers for their insightful & constructive feedback, to which we have carefully responded below. We have made these clear in our revision. We agree the statement needs to be more precise. We will expand discussion on causal identifiability, listing settings & examples where it might be feasible. More ablations to show when noise helps or hurts.
A Critical Look at the Consistency of Causal Estimation with Deep Latent Variable Models
Using deep latent variable models in causal inference has attracted considerable interest recently, but an essential open question is their ability to yield consistent causal estimates. While they have demonstrated promising results and theory exists on some simple model formulations, we also know that causal effects are not even identifiable in general with latent variables. We investigate this gap between theory and empirical results with analytical considerations and extensive experiments under multiple synthetic and real-world data sets, using the causal effect variational autoencoder (CEVAE) as a case study. While CEVAE seems to work reliably under some simple scenarios, it does not estimate the causal effect correctly with a misspecified latent variable or a complex data distribution, as opposed to its original motivation. Hence, our results show that more attention should be paid to ensuring the correctness of causal estimates with deep latent variable models.
Methods for inferring Causality
In our previous article Part 1: Getting started with Causal Inference, we covered the basics of causal inference and gave a lot of attention to Regression. We also discussed that regression is the not only way to close backdoors in causal estimation design. In this article, we are going to discuss some other methods, all aiming to achieve the same thing, that is, to make treatment and control groups similar in everything except in treatment. The goal of matching is to reduce the bias for the estimated treatment effect in an observational-data study, by finding, for every treated unit, one (or more) non-treated unit(s) with similar observable characteristics against which the covariates are balanced out. If there is some confounder, say age, which affects both the treatment and outcome, thereby making treatment and control group incomparable, we can make them comparable by matching each treated unit with a similar unit from the control group.
Pricing Engine: Estimating Causal Impacts in Real World Business Settings
Goldman, Matt, Quistorff, Brian
The explosion of data science in modern technology firms has created a new class of workers with the technical backgrounds needed to solve a wide array of statistical problems using a diverse set of machine learning (ML) techniques. However, the most important decisions made by such firms are typically policy questions such as How much should we invest in R&D?, Should we cut prices?, or Which product would benefit most from an aggressive marketing campaign?. These are all questions that hinge on understanding the causal effect of various policy interventions and, as such, cannot be answered (or even well-informed) by purely statistical approaches. Instead, they require econometric techniques that can yield answers with a clear causal interpretation. Causal inference is about understanding the true effect of a treatment, call it'D', on an outcome, call it'Y '. How would Y change if we changed D? ML on the other hand is usually about building a good predictor function of Y using many features X (that may include D).
The Blessings of Multiple Causes
Causal inference from observation data often assumes "strong ignorability," that all confounders are observed. This assumption is standard yet untestable. However, many scientific studies involve multiple causes, different variables whose effects are simultaneously of interest. We propose the deconfounder, an algorithm that combines unsupervised machine learning and predictive model checking to perform causal inference in multiple-cause settings. The deconfounder infers a latent variable as a substitute for unobserved confounders and then uses that substitute to perform causal inference. We develop theory for when the deconfounder leads to unbiased causal estimates, and show that it requires weaker assumptions than classical causal inference. We analyze its performance in three types of studies: semi-simulated data around smoking and lung cancer, semi-simulated data around genomewide association studies, and a real dataset about actors and movie revenue. The deconfounder provides a checkable approach to estimating close-to-truth causal effects.