Causal inference is central to many areas of artificial intelligence, including complex reasoning, planning, knowledge-base construction, robotics, explanation, and fairness. An active community of researchers develops and enhances algorithms that learn causal models from data, and this work has produced a series of impressive technical advances. However, evaluation techniques for causal modeling algorithms have remained somewhat primitive, limiting what we can learn from experimental studies of algorithm performance, constraining the types of algorithms and model representations that researchers consider, and creating a gap between theory and practice. We argue for more frequent use of evaluation techniques that examine interventional measures rather than structural or observational measures, and that evaluate those measures on empirical data rather than synthetic data. We survey the current practice in evaluation and show that these are rarely used in practice. We show that such techniques are feasible and that data sets are available to conduct such evaluations. We also show that these techniques produce substantially different results than using structural measures and synthetic data.
Inferring the causal direction and causal effect between two discrete random variables X and Y from a finite sample is often a crucial problem and a challenging task. However, if we have access to observational and interventional data, it is possible to solve that task. If X is causing Y, then it does not matter if we observe an effect in Y by observing changes in X or by intervening actively on X. This invariance principle creates a link between observational and interventional distributions in a higher dimensional probability space. We embed distributions that originate from samples of X and Y into that higher dimensional space such that the embedded distribution is closest to the distributions that follow the invariance principle, with respect to the relative entropy. This allows us to calculate the best information-theoretic approximation for a given empirical distribution, that follows an assumed underlying causal model. We show that this information-theoretic approximation to causal models (IACM) can be done by solving a linear optimization problem. In particular, by approximating the empirical distribution to a monotonic causal model, we can calculate probabilities of causation. We can also use IACM for causal discovery problems in the bivariate, discrete case. However, experimental results on labeled synthetic data from additive noise models show that our causal discovery approach is lagging behind state-of-the-art approaches because the invariance principle encodes only a necessary condition for causal relations. Nevertheless, for synthetic multiplicative noise data and real-world data, our approach can compete in some cases with alternative methods.
We address the problem of transferring causal knowledge learned in one environment to another, potentially different environment, when only limited experiments may be conducted at the source. This generalizes the treatment of transportability introduced in [Pearl and Bareinboim, 2011; Bareinboim and Pearl, 2012b], which deals with transferring causal information when any experiment can be conducted at the source. Given that it is not always feasible to conduct certain controlled experiments, we consider the decision problem whether experiments on a selected subset Z of variables together with qualitative assumptions encoded in a diagram may render causal effects in the target environment computable from the available data. This problem, which we call z-transportability, reduces to ordinary transportability when Z is all-inclusive, and, like the latter, can be given syntactic characterization using the do-calculus [Pearl, 1995; 2000]. This paper establishes a necessary and sufficient condition for causal effects in the target domain to be estimable from both the non-experimental information available and the limited experimental information transferred from the source. We further provides a complete algorithm for computing the transport formula, that is, a way of fusing experimental and observational information to synthesize an unbiased estimate of the desired causal relation.
We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph. We show that adversarial training can be used to learn a generative model with true observational and interventional distributions if the generator architecture is consistent with the given causal graph. We consider the application of generating faces based on given binary labels where the dependency structure between the labels is preserved with a causal graph. This problem can be seen as learning a causal implicit generative model for the image and labels. We devise a two-stage procedure for this problem. First we train a causal implicit generative model over binary labels using a neural network consistent with a causal graph as the generator. We empirically show that WassersteinGAN can be used to output discrete labels. Later, we propose two new conditional GAN architectures, which we call CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained causal implicit generative model for the labels is then a causal implicit generative model over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset.
Causal discovery, the task of automatically constructing a causal model from data, is of major significance across the sciences. Evaluating the performance of causal discovery algorithms should ideally involve comparing the inferred models to ground-truth models available for benchmark datasets, which in turn requires a notion of distance between causal models. While such distances have been proposed previously, they are limited by focusing on graphical properties of the causal models being compared. Here, we overcome this limitation by defining distances derived from the causal distributions induced by the models, rather than exclusively from their graphical structure. Pearl and Mackenzie (2018) have arranged the properties of causal models in a hierarchy called the "ladder of causation" spanning three rungs: observational, interventional, and counterfactual. Following this organization, we introduce a hierarchy of three distances, one for each rung of the ladder. Our definitions are intuitively appealing as well as efficient to compute approximately. We put our causal distances to use by benchmarking standard causal discovery systems on both synthetic and real-world datasets for which ground-truth causal models are available. Finally, we highlight the usefulness of our causal distances by briefly discussing further applications beyond the evaluation of causal discovery techniques.