Understanding Causal Inference
This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. A complementary Domino project is available. As data science work is experimental and probabilistic in nature, data scientists are often faced with making inferences. This may require a shift in mindset, particularly if moving from "traditional statistical analysis to causal analysis of multivariate data". As Domino is committed to providing the platform and tools data scientists need to accelerate their work, we reached out to Addison-Wesley Professional (AWP) Pearson for permission to excerpt "Causal Inference" from the book, Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. We appreciate the permissions to provide the chapter excerpt below as well as place the code within a complementary Domino project. We've introduced [in the book] a couple of machine-learning algorithms and suggested that they can be used to produce clear, interpretable results. You've seen that logistic regression coefficients can be used to say how much more likely an outcome will occur in conjunction with a feature (for binary features) or how much more likely an outcome is to occur per unit increase in a variable (for real-valued features). We'd like to make stronger statements. We'd like to say "If you increase a variable by a unit, then it will have the effect of making an outcome more likely." These two interpretations of a regression coefficient are so similar on the surface that you may have to read them a few times to take away the meaning. The key is that in the first case, we're describing what usually happens in a system that we observe. In the second case, we're saying what will happen if we intervene in that system and disrupt it from its normal operation. After we go through an example, we'll build up the mathematical and conceptual machinery to describe interventions. We'll cover how to go from a Bayesian network describing observational data to one that describes the effects of an intervention. We'll go through some classic approaches to estimating the effects of interventions, and finally we'll explain how to use machine-learning estimators to estimate the effects of interventions.
Nov-1-2019, 15:19:31 GMT