AITopics | Peters, Jonas

Collaborating Authors

Peters, Jonas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning by Doing: Controlling a Dynamical System using Causality, Control, and Reinforcement Learning

Weichwald, Sebastian, Mogensen, Søren Wengel, Lee, Tabitha Edith, Baumann, Dominik, Kroemer, Oliver, Guyon, Isabelle, Trimpe, Sebastian, Peters, Jonas, Pfister, Niklas

arXiv.org Machine LearningFeb-12-2022

Questions in causality, control, and reinforcement learning go beyond the classical machine learning task of prediction under i.i.d. observations. Instead, these fields consider the problem of learning how to actively perturb a system to achieve a certain effect on a response variable. Arguably, they have complementary views on the problem: In control, one usually aims to first identify the system by excitation strategies to then apply model-based design techniques to control the system. In (non-model-based) reinforcement learning, one directly optimizes a reward. In causality, one focus is on identifiability of causal structure. We believe that combining the different views might create synergies and this competition is meant as a first step toward such synergies. The participants had access to observational and (offline) interventional data generated by dynamical systems. Track CHEM considers an open-loop problem in which a single impulse at the beginning of the dynamics can be set, while Track ROBO considers a closed-loop problem in which control variables can be set at each time step. The goal in both tracks is to infer controls that drive the system to a desired state. Code is open-sourced ( https://github.com/LearningByDoingCompetition/learningbydoing-comp ) to reproduce the winning solutions of the competition and to facilitate trying out new methods on the competition tasks.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

arXiv.org Machine Learning

2202.06052

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)

Add feedback

Exploiting Independent Instruments: Identification and Distribution Generalization

Saengkyongam, Sorawit, Henckel, Leonard, Pfister, Niklas, Peters, Jonas

arXiv.org Machine LearningFeb-3-2022

When estimating the causal function between a vector of covariates X and a response Y in the presence of unobserved confounding, standard regression procedures such as ordinary least squares (OLS) are even asymptotically biased. Instrumental variable approaches (Wright, 1928; Imbens and Angrist, 1994; Newey, 2013) exploit the existence of exogenous heterogeneity in the form of an instrumental variable (IV) Z and estimate, under suitable conditions, the causal function consistently. Importantly, the errors in Y and the hidden confounders U should be uncorrelated with the instruments Z. Usually, this has to be argued for with background knowledge. When the data generating process is modeled by a structural causal model (SCM) (Pearl, 2009; Bongers et al., 2021) (so that the distribution is Markov with respect to the induced graph), then the above condition is satisfied if Y and U are d-separated from Z in the graph obtained by removing the edge from X to Y. Furthermore, in this case the errors in Y and U are even independent from Z. Using that the errors and instruments are not only uncorrelated but also independent comes with several benefits. For example, even in settings, where the causal function can be identified by classical approaches based on uncorrelatedness, the independence can be exploited to construct estimators that achieve the semiparametric efficiency bound, at least when the error distribution comes from a known, parametric family (Hansen et al., 2010). Furthermore, the independence constraint is stronger than uncorrelatedness and therefore yields stronger identifiability results, which has been reported in the field of econometrics (e.g., Imbens and Newey, 2009; Chesher, 2003).

artificial intelligence, causal function, machine learning, (16 more...)

arXiv.org Machine Learning

2202.01864

Country: North America > United States > Massachusetts > Middlesex County (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Structure Learning for Directed Trees

Jakobsen, Martin Emil, Shah, Rajen D., Bühlmann, Peter, Peters, Jonas

arXiv.org Machine LearningAug-19-2021

Knowing the causal structure of a system is of fundamental interest in many areas of science and can aid the design of prediction algorithms that work well under manipulations to the system. The causal structure becomes identifiable from the observational distribution under certain restrictions. To learn the structure from data, score-based methods evaluate different graphs according to the quality of their fits. However, for large nonlinear models, these rely on heuristic optimization approaches with no general guarantees of recovering the true causal structure. In this paper, we consider structure learning of directed trees. We propose a fast and scalable method based on Chu-Liu-Edmonds' algorithm we call causal additive trees (CAT). For the case of Gaussian errors, we prove consistency in an asymptotic regime with a vanishing identifiability gap. We also introduce a method for testing substructure hypotheses with asymptotic family-wise error rate control that is valid post-selection and in unidentified settings. Furthermore, we study the identifiability gap, which quantifies how much better the true causal model fits the observational distribution, and prove that it is lower bounded by local properties of the causal model. Simulation studies demonstrate the favorable performance of CAT compared to competing structure learning methods.

artificial intelligence, identifiability gap, optimization problem, (19 more...)

arXiv.org Machine Learning

2108.08871

Country:

North America > United States > Massachusetts (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

Invariant Policy Learning: A Causal Perspective

Saengkyongam, Sorawit, Thams, Nikolaj, Peters, Jonas, Pfister, Niklas

arXiv.org Machine LearningJun-7-2021

In the past decade, contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over time or over different environments. In many real world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we tackle the problem of environmental shifts under the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved confounders are present and show that, in that case, an optimal invariant policy is guaranteed, under certain assumptions, to generalize across environments. Our results do not only provide a solution to the environmental shift problem but also establish concrete connections among causality, invariance and contextual bandits.

health & medicine, invariant policy, optimization problem, (16 more...)

arXiv.org Machine Learning

2106.00808

Country:

North America > United States (0.14)
Europe > Denmark (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Education (0.49)
Health & Medicine (0.48)
Marketing (0.34)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

Add feedback

Regularizing towards Causal Invariance: Linear Models with Proxies

Oberst, Michael, Thams, Nikolaj, Peters, Jonas, Sontag, David

arXiv.org Machine LearningMar-3-2021

We propose a method for learning linear models whose predictive performance is robust to causal interventions on unobserved variables, when noisy proxies of those variables are available. Our approach takes the form of a regularization term that trades off between in-distribution performance and robustness to interventions. Under the assumption of a linear structural causal model, we show that a single proxy can be used to create estimators that are prediction optimal under interventions of bounded strength. This strength depends on the magnitude of the measurement noise in the proxy, which is, in general, not identifiable. In the case of two proxy variables, we propose a modified estimator that is prediction optimal under interventions up to a known strength. We further show how to extend these estimators to scenarios where additional information about the "test time" intervention is available during training. We evaluate our theoretical findings in synthetic experiments and using real data of hourly pollution levels across several cities in China.

artificial intelligence, health & medicine, intervention, (19 more...)

arXiv.org Machine Learning

2103.02477

Country: Asia > China (0.48)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Distributional Robustness of K-class Estimators and the PULSE

Jakobsen, Martin Emil, Peters, Jonas

arXiv.org Machine LearningJul-21-2020

Recently, in causal discovery, invariance properties such as the moment criterion which two-stage least square estimator leverage have been exploited for causal structure learning: e.g., in cases, where the causal parameter is not identifiable, some structure of the non-zero components may be identified, and coverage guarantees are available. Subsequently, anchor regression has been proposed to trade-off invariance and predictability. The resulting estimator is shown to have optimal predictive performance under bounded shift interventions. In this paper, we show that the concepts of anchor regression and K-class estimators are closely related. Establishing this connection comes with two benefits: (1) It enables us to prove robustness properties for existing K-class estimators when considering distributional shifts. And, (2), we propose a novel estimator in instrumental variable settings by minimizing the mean squared prediction error subject to the constraint that the estimator lies in an asymptotically valid confidence region of the causal parameter. We call this estimator PULSE (p-uncorrelated least squares estimator) and show that it can be computed efficiently, even though the underlying optimization problem is non-convex. We further prove that it is consistent. We perform simulation experiments illustrating that there are several settings including weak instrument settings, where PULSE outperforms other estimators and suffers from less variability.

estimator, optimization problem, survey article, (21 more...)

arXiv.org Machine Learning

2005.03353

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > Experimental Study (0.67)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)

Add feedback

Theoretical Aspects of Cyclic Structural Causal Models

Bongers, Stephan, Peters, Jonas, Schölkopf, Bernhard, Mooij, Joris M.

arXiv.org Artificial IntelligenceAug-4-2018

Structural causal models (SCMs), also known as (non-parametric) structural equation models (SEMs), are widely used for causal modeling purposes. A large body of theoretical results is available for the special case in which cycles are absent (i.e., acyclic SCMs, also known as recursive SEMs). However, in many application domains cycles are abundantly present, for example in the form of feedback loops. In this paper, we provide a general and rigorous theory of cyclic SCMs. The paper consists of two parts: the first part gives a rigorous treatment of structural causal models, dealing with measure-theoretic and other complications that arise in the presence of cycles. In contrast with the acyclic case, in cyclic SCMs solutions may no longer exist, or if they exist, they may no longer be unique, or even measurable in general. We give several sufficient and necessary conditions for the existence of (unique) measurable solutions. We show how causal reasoning proceeds in these models and how this differs from the acyclic case. Moreover, we give an overview of the Markov properties that hold for cyclic SCMs. In the second part, we address the question of how one can marginalize an SCM (possibly with cycles) to a subset of the endogenous variables. We show that under a certain condition, one can effectively remove a subset of the endogenous variables from the model, leading to a more parsimonious marginal SCM that preserves the causal and counterfactual semantics of the original SCM on the remaining variables. Moreover, we show how the marginalization relates to the latent projection and to latent confounders, i.e. latent common causes.

artificial intelligence, scm, survey article, (17 more...)

arXiv.org Artificial Intelligence

1611.06221

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Kernel-based Tests for Joint Independence

Pfister, Niklas, Bühlmann, Peter, Schölkopf, Bernhard, Peters, Jonas

arXiv.org Machine LearningNov-4-2016

We investigate the problem of testing whether $d$ random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two variable Hilbert-Schmidt independence criterion (HSIC) but allows for an arbitrary number of variables. We embed the $d$-dimensional joint distribution and the product of the marginals into a reproducing kernel Hilbert space and define the $d$-variable Hilbert-Schmidt independence criterion (dHSIC) as the squared distance between the embeddings. In the population case, the value of dHSIC is zero if and only if the $d$ variables are jointly independent, as long as the kernel is characteristic. Based on an empirical estimate of dHSIC, we define three different non-parametric hypothesis tests: a permutation test, a bootstrap test and a test based on a Gamma approximation. We prove that the permutation test achieves the significance level and that the bootstrap test achieves pointwise asymptotic significance level as well as pointwise asymptotic consistency (i.e., it is able to detect any type of fixed dependence in the large sample limit). The Gamma approximation does not come with these guarantees; however, it is computationally very fast and for small $d$, it performs well in practice. Finally, we apply the test to a problem in causal discovery.

artificial intelligence, dhsic, machine learning, (18 more...)

arXiv.org Machine Learning

1603.00285

Country:

North America > United States (0.27)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

BACKSHIFT: Learning causal cyclic graphs from unknown shift interventions

Rothenhäusler, Dominik, Heinze, Christina, Peters, Jonas, Meinshausen, Nicolai

Neural Information Processing SystemsDec-31-2015

We propose a simple method to learn linear causal cyclic models in the presence of latent variables. The method relies on equilibrium data of the model recorded under a specific kind of interventions (``shift interventions''). The location and strength of these interventions do not have to be known and can be estimated from the data. Our method, called BACKSHIFT, only uses second moments of the data and performs simple joint matrix diagonalization, applied to differences between covariance matrices. We give a sufficient and necessary condition for identifiability of the system, which is fulfilled almost surely under some quite general assumptions if and only if there are at least three distinct experimental settings, one of which can be pure observational data. We demonstrate the performance on some simulated data and applications in flow cytometry and financial time series.

artificial intelligence, health & medicine, intervention, (18 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Distinguishing cause from effect using observational data: methods and benchmarks

Mooij, Joris M., Peters, Jonas, Janzing, Dominik, Zscheischler, Jakob, Schölkopf, Bernhard

arXiv.org Artificial IntelligenceDec-24-2015

The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.

janzing, upstream oil & gas, vascular disease, (26 more...)

arXiv.org Artificial Intelligence

1412.3773

Country:

North America > United States (1.00)
Asia (0.67)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance (0.92)
Materials (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback