Current machine learning systems operate, almost exclusively, in a statistical, or model-free mode, which entails severe theoretical limits on their power and performance. Such systems cannot reason about interventions and retrospection and, therefore, cannot serve as the basis for strong AI. To achieve human level intelligence, learning machines need the guidance of a model of reality, similar to the ones used in causal inference tasks. To demonstrate the essential role of such models, I will present a summary of seven tasks which are beyond reach of current machine learning systems and which have been accomplished using the tools of causal modeling.
We provide a conceptual map to navigate causal analysis problems. Focusing on the case of discrete random variables, we consider the case of causal effect estimation from observational data. The presented approaches apply also to continuous variables, but the issue of estimation becomes more complex. We then introduce the four schools of thought for causal analysis
Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure (randomized or not) on an outcome interest. Even studies that are seemingly non-causal (e.g. prediction or prevalence estimation) have causal elements, such as differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in the data. In this work, we review the "Causal Roadmap", a formal framework to augment our traditional statistical analyses in an effort to answer the causal questions driving our research. Specific steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to translate the causal parameter into a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. Throughout we focus on the effect of an exposure occurring at a single time point and provide extensions to more advanced settings.
Causal inference goes beyond prediction by modeling the outcome of interventions and formalizing counterfactual reasoning. In this blog post, I provide an introduction to the graphical approach to causal inference in the tradition of Sewell Wright, Judea Pearl, and others. We first rehash the common adage that correlation is not causation. We then move on to climb what Pearl calls the "ladder of causal inference", from association (seeing) to intervention (doing) to counterfactuals (imagining). We will discover how directed acyclic graphs describe conditional (in)dependencies; how the do-calculus describes interventions; and how Structural Causal Models allow us to imagine what could have been. This blog post is by no means exhaustive, but should give you a first appreciation of the concepts that surround causal inference; references to further readings are provided below. Messerli (2012) published a paper entitled "Chocolate Consumption, Cognitive Function, and Nobel Laureates" in The New England Journal of Medicine showing a strong positive relationship between chocolate consumption and the number of Nobel Laureates. I have found an even stronger relationship using updated data2, as visualized in the figure below. Now, except for people in the chocolate business, it would be quite a stretch to suggest that increasing chocolate consumption would increase the number Nobel Laureates. Correlation does not imply causation because it does not constrain the possible causal relations enough. If two random variables $X$ and $Y$ are statistically dependent ($X \perp Y$), then either (a) $X$ causes $Y$, (b) $Y$ causes $X$, or (c) there exists a third variable $Z$ that causes both $X$ and $Y$. Further, $X$ and $Y$ become independent given $Z$, i.e., $X \perp Y \mid Z$. An in principle straightforward way to break this uncertainty is to conduct an experiment: we could, for example, force the citizens of Austria to consume more chocolate, and study whether this increases the number of Nobel laureates in the following years.
This paper introduces Whittemore, a language for causal programming. Causal programming is based on the theory of structural causal models and consists of two primary operations: identification, which finds formulas that compute causal queries, and estimation, which applies formulas to transform probability distributions to other probability distribution. Causal programming provides abstractions to declare models, queries, and distributions with syntax similar to standard mathematical notation, and conducts rigorous causal inference, without requiring detailed knowledge of the underlying algorithms. Examples of causal inference with real data are provided, along with discussion of the implementation and possibilities for future extension.