Shifts in environment between development and deployment cause classical supervised learning to produce models that fail to generalize well to new target distributions. Recently, many solutions which find invariant predictive distributions have been developed. Among these, graph-based approaches do not require data from the target environment and can capture more stable information than alternative methods which find stable feature sets. However, these approaches assume that the data generating process is known in the form of a full causal graph, which is generally not the case. In this paper, we propose I-SPEC, an end-to-end framework that addresses this shortcoming by using data to learn a partial ancestral graph (PAG). Using the PAG we develop an algorithm that determines an interventional distribution that is stable to the declared shifts; this subsumes existing approaches which find stable feature sets that are less accurate. We apply I-SPEC to a mortality prediction problem to show it can learn a model that is robust to shifts without needing upfront knowledge of the full causal DAG.
Recent work addressing model reliability and generalization has resulted in a variety of methods that seek to proactively address differences between the training and unknown target environments. While most methods achieve this by finding distributions that will be invariant across environments, we will show they do not necessarily find the same distributions which has implications for performance. In this paper we unify existing work on prediction using stable distributions by relating environmental shifts to edges in the graph underlying a prediction problem, and characterize stable distributions as those which effectively remove these edges. We then quantify the effect of edge deletion on performance in the linear case and corroborate the findings in a simulated and real data experiment.
Standard supervised learning breaks down under data distribution shift. However, the principle of independent causal mechanisms (ICM, Peters et al. (2017)) can turn this weakness into an opportunity: one can take advantage of distribution shift between different environments during training in order to obtain more robust models. We propose a new gradient-based learning framework whose objective function is derived from the ICM principle. We show theoretically and experimentally that neural networks trained in this framework focus on relations remaining invariant across environments and ignore unstable ones. Moreover, we prove that the recovered stable relations correspond to the true causal mechanisms under certain conditions. In both regression and classification, the resulting models generalize well to unseen scenarios where traditionally trained models fail.
Classical supervised learning produces unreliable models when training and target distributions differ, with most existing solutions requiring samples from the target domain. We propose a proactive approach which learns a relationship in the training domain that will generalize to the target domain by incorporating prior knowledge of aspects of the data generating process that are expected to differ as expressed in a causal selection diagram. Specifically, we remove variables generated by unstable mechanisms from the joint factorization to yield the Graph Surgery Estimator---an interventional distribution that is invariant to the differences across domains. We prove that the surgery estimator finds stable relationships in strictly more scenarios than previous approaches which only consider conditional relationships, and demonstrate this in simulated experiments. We also evaluate on real world data for which the true causal diagram is unknown, performing competitively against entirely data-driven approaches.
Learning under multi-environments often requires the ability of out-of-distribution generalization for the worst-environment performance guarantee. Some novel algorithms, e.g. Invariant Risk Minimization and Risk Extrapolation, build stable models by extracting invariant (causal) feature. However, it remains unclear how these methods learn to remove the environmental features. In this paper, we focus on the Risk Extrapolation (REx) and make attempts to fill this gap. We first propose a framework, Quasi-Distributional Robustness, to unify the Empirical Risk Minimization (ERM), the Robust Optimization (RO) and the Risk Extrapolation. Then, under this framework, we show that, comparing to ERM and RO, REx has a much larger robust region. Furthermore, based on our analysis, we propose a novel regularization method, Risk Variance Penalization (RVP), which is derived from REx. The proposed method is easy to implement, and has proper degree of penalization, and enjoys an interpretable tuning parameter. Finally, our experiments show that under certain conditions, the regularization strategy that encourages the equality of training risks has ability to discover relationships which do not exist in the training data. This provides important evidence to support that RVP is useful to discover causal models.