Collaborating Authors

Distributional Robustness of K-class Estimators and the PULSE Machine Learning

Recently, in causal discovery, invariance properties such as the moment criterion which two-stage least square estimator leverage have been exploited for causal structure learning: e.g., in cases, where the causal parameter is not identifiable, some structure of the non-zero components may be identified, and coverage guarantees are available. Subsequently, anchor regression has been proposed to trade-off invariance and predictability. The resulting estimator is shown to have optimal predictive performance under bounded shift interventions. In this paper, we show that the concepts of anchor regression and K-class estimators are closely related. Establishing this connection comes with two benefits: (1) It enables us to prove robustness properties for existing K-class estimators when considering distributional shifts. And, (2), we propose a novel estimator in instrumental variable settings by minimizing the mean squared prediction error subject to the constraint that the estimator lies in an asymptotically valid confidence region of the causal parameter. We call this estimator PULSE (p-uncorrelated least squares estimator) and show that it can be computed efficiently, even though the underlying optimization problem is non-convex. We further prove that it is consistent. We perform simulation experiments illustrating that there are several settings including weak instrument settings, where PULSE outperforms other estimators and suffers from less variability.

Learning Joint Nonlinear Effects from Single-variable Interventions in the Presence of Hidden Confounders Machine Learning

We propose an approach to estimate the effect of multiple simultaneous interventions in the presence of hidden confounders. To overcome the problem of hidden confounding, we consider the setting where we have access to not only the observational data but also sets of single-variable interventions in which each of the treatment variables is intervened on separately. We prove identifiability under the assumption that the data is generated from a nonlinear continuous structural causal model with additive Gaussian noise. In addition, we propose a simple parameter estimation method by pooling all the data from different regimes and jointly maximizing the combined likelihood. We also conduct comprehensive experiments to verify the identifiability result as well as to compare the performance of our approach against a baseline on both synthetic and real-world data.

Structure Learning for Directed Trees Machine Learning

Knowing the causal structure of a system is of fundamental interest in many areas of science and can aid the design of prediction algorithms that work well under manipulations to the system. The causal structure becomes identifiable from the observational distribution under certain restrictions. To learn the structure from data, score-based methods evaluate different graphs according to the quality of their fits. However, for large nonlinear models, these rely on heuristic optimization approaches with no general guarantees of recovering the true causal structure. In this paper, we consider structure learning of directed trees. We propose a fast and scalable method based on Chu-Liu-Edmonds' algorithm we call causal additive trees (CAT). For the case of Gaussian errors, we prove consistency in an asymptotic regime with a vanishing identifiability gap. We also introduce a method for testing substructure hypotheses with asymptotic family-wise error rate control that is valid post-selection and in unidentified settings. Furthermore, we study the identifiability gap, which quantifies how much better the true causal model fits the observational distribution, and prove that it is lower bounded by local properties of the causal model. Simulation studies demonstrate the favorable performance of CAT compared to competing structure learning methods.

CAM: Causal additive models, high-dimensional order search and penalized regression Machine Learning

We develop estimation for potentially high-dimensional additive structural equation models. A key component of our approach is to decouple order search among the variables from feature or edge selection in a directed acyclic graph encoding the causal structure. We show that the former can be done with nonregularized (restricted) maximum likelihood estimation while the latter can be efficiently addressed using sparse regression techniques. Thus, we substantially simplify the problem of structure search and estimation for an important class of causal models. We establish consistency of the (restricted) maximum likelihood estimator for low- and high-dimensional scenarios, and we also allow for misspecification of the error distribution. Furthermore, we develop an efficient computational algorithm which can deal with many variables, and the new method's accuracy and performance is illustrated on simulated and real data.

Orthogonal Structure Search for Efficient Causal Discovery from Observational Data Machine Learning

A more formal discussion explanatory variables is of high practical importance is provided in Section 2. in many disciplines. Recent work exploits stability of regression coefficients or invariance However most state of the art methods suffer from scalability properties of models across different experimental problems since they scan all potential subsets of variables conditions for reconstructing the full causal and test whether the conditional distribution of Y given graph. These approaches generally do not scale a subset of variables is invariant across all environments well with the number of the explanatory variables (Peters et al., 2016) . This search is hence exponential in and are difficult to extend to nonlinear relationships. the number of covariates; the methods, while maintaining Contrary to existing work, we propose an appealing theoretical guarantees, are thus already computationally approach which even works for observational data hard for graphs of ten variables, and get infeasible alone, while still offering theoretical guarantees for larger graphs, unless one resorts to heuristic procedures.