A long-standing open research problem is how to use information from different experiments, including background knowledge, to infer causal relations. Recent developments have shown ways to use multiple data sets, provided they originate from identical experiments. We present the MCI-algorithm as the first method that can infer provably valid causal relations in the large sample limit from different experiments. It is fast, reliable and produces very clear and easily interpretable output. It is based on a result that shows that constraint-based causal discovery is decomposable into a candidate pair identification and subsequent elimination step that can be applied separately from different models.
Luo, Zhiyi (Shanghai Jiao Tong University) | Sha, Yuchen (Shanghai Jiao Tong University) | Zhu, Kenny Q. (Shanghai Jiao Tong University) | Hwang, Seung-Won (Yonsei University) | Wang, Zhongyuan (Microsoft Research Asia)
Commonsense causal reasoning is the process of capturing and understanding the causal dependencies amongst events and actions. Such events and actions can be expressed in terms, phrases or sentences in natural language text. Therefore, one possible way of obtaining causal knowledge is by extracting causal relations between terms or phrases from a large text corpus. However, causal relations in text are sparse, ambiguous, and sometimes implicit, and thus difficult to obtain. This paper attacks the problem of commonsense causality reasoning between short texts (phrases and sentences) using a data driven approach. We propose a framework that automatically harvests a network of causal-effect terms from a large web corpus. Backed by this network, we propose a novel and effective metric to properly model the causality strength between terms. We show these signals can be aggregated for causality reasonings between short texts, including sentences and phrases. In particular, our approach outperforms all previously reported results in the standard SEMEVAL COPA task by substantial margins.
We target the problem of accuracy and robustness in causal inference from finite data sets. Our aim is to combine the inherent robustness of the Bayesian approach with the theoretical strength and clarity of constraint-based methods. We use a Bayesian score to obtain probability estimates on the input statements used in a constraint-based procedure. These are subsequently processed in decreasing order of reliability, letting more reliable decisions take precedence in case of conflicts, until a single output model is obtained. Tests show that a basic implementation of the resulting Bayesian Constraint-based Causal Discovery (BCCD) algorithm already outperforms established procedures such as FCI and Conservative PC. It indicates which causal decisions in the output have high reliability and which do not. The approach is easily adapted to other application areas such as complex independence tests.
We introduce Joint Causal Inference (JCI), a powerful formulation of causal discovery from multiple datasets that allows to jointly learn both the causal structure and targets of interventions from statistical independences in pooled data. Compared with existing constraint-based approaches for causal discovery from multiple data sets, JCI offers several advantages: it allows for several different types of interventions in a unified fashion, it can learn intervention targets, it systematically pools data across different datasets which improves the statistical power of independence tests, and most importantly, it improves on the accuracy and identifiability of the predicted causal relations. A technical complication that arises in JCI is the occurrence of faithfulness violations due to deterministic relations. We propose a simple but effective strategy for dealing with this type of faithfulness violations. We implement it in ACID, a determinism-tolerant extension of Ancestral Causal Inference (ACI) (Magliacane et al., 2016), a recently proposed logic-based causal discovery method that improves reliability of the output by exploiting redundant information in the data. We illustrate the benefits of JCI with ACID with an evaluation on a simulated dataset.