Causal Discovery as Semi-Supervised Learning
Oates, Chris. J., Mukherjee, Sach
In this short report, we discuss an approach to estimating causal graphs in which indicators of causal influence between variables are treated as labels in a machine learning formulation. Available data on the variables of interest are used as "inputs" to estimate the labels. We frame the problem as one of semi-supervised learning: available interventional data or background knowledge provide labels on some edges in the graph and the remaining edges are treated as unlabelled objects. To illustrate the key ideas, we consider a simple approach to feature construction (rooted in bivariate kernel density estimation) and embed this within a semi-supervised manifold framework. Results on yeast knockout data demonstrate that the proposed approach can identify causal relationships as validated by unseen interventional experiments. An advantage of the formulation we propose is that by reframing causal discovery as semi-supervised learning, it allows a range of data-driven approaches to be brought to bear on causal discovery, without demanding specification of full probability models or explicit models of underlying mechanisms.
Dec-16-2016