This paper describes a Bayesian method for combining an arbitrary mixture of observational and experimental data in order to learn causal Bayesian networks. Observational data are passively observed. Experimental data, such as that produced by randomized controlled trials, result from the experimenter manipulating one or more variables (typically randomly) and observing the states of other variables. The paper presents a Bayesian method for learning the causal structure and parameters of the underlying causal process that is generating the data, given that (1) the data contains a mixture of observational and experimental case records, and (2) the causal process is modeled as a causal Bayesian network. This learning method was applied using as input various mixtures of experimental and observational data that were generated from the ALARM causal Bayesian network. In these experiments, the absolute and relative quantities of experimental and observational data were varied systematically. For each of these training datasets, the learning method was applied to predict the causal structure and to estimate the causal parameters that exist among randomly selected pairs of nodes in ALARM that are not confounded. The paper reports how these structure predictions and parameter estimates compare with the true causal structures and parameters as given by the ALARM network.
We provide a conceptual map to navigate causal analysis problems. Focusing on the case of discrete random variables, we consider the case of causal effect estimation from observational data. The presented approaches apply also to continuous variables, but the issue of estimation becomes more complex. We then introduce the four schools of thought for causal analysis
Discovering causal relationships is a hard task, often hindered by the need for intervention, and often requiring large amounts of data to resolve statistical uncertainty. However, humans quickly arrive at useful causal relationships. One possible reason is that humans extrapolate from past experience to new, unseen situations: that is, they encode beliefs over causal invariances, allowing for sound generalization from the observations they obtain from directly acting in the world. Here we outline a Bayesian model of causal induction where beliefs over competing causal hypotheses are modeled using probability trees. Based on this model, we illustrate why, in the general case, we need interventions plus constraints on our causal hypotheses in order to extract causal information from our experience.
A very important topic in systems biology is developing statistical methods that automatically find causal relations in gene regulatory networks with no prior knowledge of causal connectivity. Many methods have been developed for time series data. However, discovery methods based on steady-state data are often necessary and preferable since obtaining time series data can be more expensive and/or infeasible for many biological systems. A conventional approach is causal Bayesian networks. However, estimation of Bayesian networks is ill-posed. In many cases it cannot uniquely identify the underlying causal network and only gives a large class of equivalent causal networks that cannot be distinguished between based on the data distribution. We propose a new discovery algorithm for uniquely identifying the underlying causal network of genes. To the best of our knowledge, the proposed method is the first algorithm for learning gene networks based on a fully identifiable causal model called LiNGAM. We here compare our algorithm with competing algorithms using artificially-generated data, although it is definitely better to test it based on real microarray gene expression data.