anticausal
Causal vs. Anticausal merging of predictors
We study the differences arising from merging predictors in the causal and anticausal directions using the same data.In particular we study the asymmetries that arise in a simple model where we merge the predictors using one binary variable as target and two continuous variables as predictors.We use Causal Maximum Entropy (CMAXENT) as inductive bias to merge the predictors, however, we expect similar differences to hold also when we use other merging methods that take into account asymmetries between cause and effect.We show that if we observe all bivariate distributions, the CMAXENT solution reduces to a logistic regression in the causal direction and Linear Discriminant Analysis (LDA) in the anticausal direction.Furthermore, we study how the decision boundaries of these two solutions differ whenever we observe only some of the bivariate distributions implications for Out-Of-Variable (OOV) generalisation.
Initial Results for Pairwise Causal Discovery Using Quantitative Information Flow
Giori, Felipe, Figueiredo, Flavio
Pairwise Causal Discovery is the task of determining causal, anticausal, confounded or independence relationships from pairs of variables. Over the last few years, this challenging task has promoted not only the discovery of novel machine learning models aimed at solving the task, but also discussions on how learning the causal direction of variables may benefit machine learning overall. In this paper, we show that Quantitative Information Flow (QIF), a measure usually employed for measuring leakages of information from a system to an attacker, shows promising results as features for the task. In particular, experiments with real-world datasets indicate that QIF is statistically tied to the state of the art. Our initial results motivate further inquiries on how QIF relates to causality and what are its limitations.
Causal Direction of Data Collection Matters: Implications of Causal and Anticausal Learning for NLP
Jin, Zhijing, von Kügelgen, Julius, Ni, Jingwei, Vaidhya, Tejas, Kaushal, Ayush, Sachan, Mrinmaya, Schölkopf, Bernhard
The principle of independent causal mechanisms (ICM) states that generative processes of real world data consist of independent modules which do not influence or inform each other. While this idea has led to fruitful developments in the field of causal inference, it is not widely-known in the NLP community. In this work, we argue that the causal direction of the data collection process bears nontrivial implications that can explain a number of published NLP findings, such as differences in semi-supervised learning (SSL) and domain adaptation (DA) performance across different settings. We categorize common NLP tasks according to their causal direction and empirically assay the validity of the ICM principle for text data using minimum description length. We conduct an extensive meta-analysis of over 100 published SSL and 30 DA studies, and find that the results are consistent with our expectations based on causal insights. This work presents the first attempt to analyze the ICM principle in NLP, and provides constructive suggestions for future modeling choices. Code available at https://github.com/zhijing-jin/icm4nlp
Semi-Supervised Learning, Causality and the Conditional Cluster Assumption
von Kügelgen, Julius, Loog, Marco, Mey, Alexander, Schölkopf, Bernhard
While the success of semi-supervised learning (SSL) is still not fully understood, Sch\"olkopf et al. (2012) have established a link to the principle of independent causal mechanisms. They conclude that SSL should be impossible when predicting a target variable from its causes, but possible when predicting it from its effects. Since both these cases are somewhat restrictive, we extend their work by considering classification using cause and effect features at the same time, such as predicting a disease from both risk factors and symptoms. While standard SSL exploits information contained in the marginal distribution of the inputs (to improve our estimate of the conditional distribution of target given inputs), we argue that in our more general setting we can use information in the conditional of effect features given causal features. We explore how this insight generalizes the previous understanding, and how it relates to and can be exploited for SSL.