Goto

Collaborating Authors

 Zennaro, Fabio Massimo


Causal Abstraction Learning based on the Semantic Embedding Principle

arXiv.org Artificial Intelligence

Structural causal models (SCMs) allow us to investigate complex systems at multiple levels of resolution. The causal abstraction (CA) framework formalizes the mapping between high- and low-level SCMs. We address CA learning in a challenging and realistic setting, where SCMs are inaccessible, interventional data is unavailable, and sample data is misaligned. A key principle of our framework is $\textit{semantic embedding}$, formalized as the high-level distribution lying on a subspace of the low-level one. This principle naturally links linear CA to the geometry of the $\textit{Stiefel manifold}$. We present a category-theoretic approach to SCMs that enables the learning of a CA by finding a morphism between the low- and high-level probability measures, adhering to the semantic embedding principle. Consequently, we formulate a general CA learning problem. As an application, we solve the latter problem for linear CA; considering Gaussian measures and the Kullback-Leibler divergence as an objective. Given the nonconvexity of the learning task, we develop three algorithms building upon existing paradigms for Riemannian optimization. We demonstrate that the proposed methods succeed on both synthetic and real-world brain data with different degrees of prior information about the structure of CA.


Aligning Graphical and Functional Causal Abstractions

arXiv.org Artificial Intelligence

Causal abstractions allow us to relate causal models on different levels of granularity. To ensure that the models agree on cause and effect, frameworks for causal abstractions define notions of consistency. Two distinct methods for causal abstraction are common in the literature: (i) graphical abstractions, such as Cluster DAGs, which relate models on a structural level, and (ii) functional abstractions, like $\alpha$-abstractions, which relate models by maps between variables and their ranges. In this paper we will align the notions of graphical and functional consistency and show an equivalence between the class of Cluster DAGs, consistent $\alpha$-abstractions, and constructive $\tau$-abstractions. Furthermore, we extend this alignment and the expressivity of graphical abstractions by introducing Partial Cluster DAGs. Our results provide a rigorous bridge between the functional and graphical frameworks and allow for adoption and transfer of results between them.


Causally Abstracted Multi-armed Bandits

arXiv.org Artificial Intelligence

Multi-armed bandits (MAB) and causal MABs (CMAB) are established frameworks for decision-making problems. The majority of prior work typically studies and solves individual MAB and CMAB in isolation for a given problem and associated data. However, decision-makers are often faced with multiple related problems and multi-scale observations where joint formulations are needed in order to efficiently exploit the problem structures and data dependencies. Transfer learning for CMABs addresses the situation where models are defined on identical variables, although causal connections may differ. In this work, we extend transfer learning to setups involving CMABs defined on potentially different variables, with varying degrees of granularity, and related via an abstraction map. Formally, we introduce the problem of causally abstracted MABs (CAMABs) by relying on the theory of causal abstraction in order to express a rigorous abstraction map. We propose algorithms to learn in a CAMAB, and study their regret. We illustrate the limitations and the strengths of our algorithms on a real-world scenario related to online advertising.


Interventionally Consistent Surrogates for Agent-based Simulators

arXiv.org Machine Learning

Agent-based models (ABMs) are a powerful tool for modelling complex decision-making systems across application domains, including the social sciences (Baptista et al., 2016), epidemiology (Kerr et al., 2021), and finance (Cont, 2007). Such models provide high-fidelity and granular representations of intricate systems of autonomous, interacting, and decision-making agents by modelling the system under consideration at the level of its individual constituent actors. In this way, ABMs enable decision-makers to experiment with, and understand the potential consequences of, policy interventions of interest, thereby allowing for more effective control of the potentially deleterious effects that arise from the endogenous dynamics of the real-world system. In economic systems, for example, such policy interventions may take the form of imposed limits on loan-to-value ratios in housing markets as a means for attenuating housing price cycles (Baptista et al., 2016), while in epidemiology, such interventions may take the form of (non-)pharmaceutical interventions to inhibit the transmission of a disease (Kerr et al., 2021). Whilst ABMs promise many benefits, their complexity generally necessitates the use of simulation studies to understand their behaviours, and their granularity can result in large computational costs even for single forward simulations. In many cases, such costs can be prohibitively large, presenting a barrier to their use as synthetic test environments for potential policy interventions in practice. Moreover, the high-fidelity data generated by ABMs can be difficult for policymakers to interpret and relate to policy interventions that act system-wide (Haldane and Turrell, 2018).


Causal Optimal Transport of Abstractions

arXiv.org Machine Learning

Causal abstraction (CA) theory establishes formal criteria for relating multiple structural causal models (SCMs) at different levels of granularity by defining maps between them. These maps have significant relevance for real-world challenges such as synthesizing causal evidence from multiple experimental environments, learning causally consistent representations at different resolutions, and linking interventions across multiple SCMs. In this work, we propose COTA, the first method to learn abstraction maps from observational and interventional data without assuming complete knowledge of the underlying SCMs. In particular, we introduce a multi-marginal Optimal Transport (OT) formulation that enforces do-calculus causal constraints, together with a cost function that relies on interventional information. We extensively evaluate COTA on synthetic and real world problems, and showcase its advantages over non-causal, independent and aggregated COTA formulations. Finally, we demonstrate the efficiency of our method as a data augmentation tool by comparing it against the state-of-the-art CA learning framework, which assumes fully specified SCMs, on a real-world downstream task.


Jointly Learning Consistent Causal Abstractions Over Multiple Interventional Distributions

arXiv.org Artificial Intelligence

An abstraction can be used to relate two structural causal models representing the same system at different levels of resolution. Learning abstractions which guarantee consistency with respect to interventional distributions would allow one to jointly reason about evidence across multiple levels of granularity while respecting the underlying cause-effect relationships. In this paper, we introduce a first framework for causal abstraction learning between SCMs based on the formalization of abstraction recently proposed by Rischel (2020). Based on that, we propose a differentiable programming solution that jointly solves a number of combinatorial sub-problems, and we study its performance and benefits against independent and sequential approaches on synthetic settings and on a challenging real-world problem related to electric vehicle battery manufacturing.


Quantifying Consistency and Information Loss for Causal Abstraction Learning

arXiv.org Artificial Intelligence

Structural causal models provide a formalism to express causal relations between variables of interest. Models and variables can represent a system at different levels of abstraction, whereby relations may be coarsened and refined according to the need of a modeller. However, switching between different levels of abstraction requires evaluating a trade-off between the consistency and the information loss among different models. In this paper we introduce a family of interventional measures that an agent may use to evaluate such a trade-off. We consider four measures suited for different tasks, analyze their properties, and propose algorithms to evaluate and learn causal abstractions. Finally, we illustrate the flexibility of our setup by empirically showing how different measures and algorithmic choices may lead to different abstractions.


Towards Computing an Optimal Abstraction for Structural Causal Models

arXiv.org Artificial Intelligence

Working with causal models at different levels of abstraction is an important feature of science. Existing work has already considered the problem of expressing formally the relation of abstraction between causal models. In this paper, we focus on the problem of learning abstractions. We start by defining the learning problem formally in terms of the optimization of a standard measure of consistency. We then point out the limitation of this approach, and we suggest extending the objective function with a term accounting for information loss. We suggest a concrete measure of information loss, and we illustrate its contribution to learning new abstractions.


Modeling Penetration Testing with Reinforcement Learning Using Capture-the-Flag Challenges and Tabular Q-Learning

arXiv.org Artificial Intelligence

Penetration testing is a security exercise aimed at assessing the security of a system by simulating attacks against it. So far, penetration testing has been carried out mainly by trained human attackers and its success critically depended on the available expertise. Automating this practice constitutes a non-trivial problem, as the range of actions that a human expert may attempts against a system and the range of knowledge she relies on to take her decisions are hard to capture. In this paper, we focus our attention on simplified penetration testing problems expressed in the form of capture the flag hacking challenges, and we apply reinforcement learning algorithms to try to solve them. In modelling these capture the flag competitions as reinforcement learning problems we highlight the specific challenges that characterize penetration testing. We observe these challenges experimentally across a set of varied simulations, and we study how different reinforcement learning techniques may help us addressing these challenges. In this way we show the feasibility of tackling penetration testing using reinforcement learning, and we highlight the challenges that must be taken into consideration, and possible directions to solve them.


Towards Further Understanding of Sparse Filtering via Information Bottleneck

arXiv.org Machine Learning

In this paper we examine a formalization of feature distribution learning (FDL) in information-theoretic terms relying on the analytical approach and on the tools already used in the study of the information bottleneck (IB). It has been conjectured that the behavior of FDL algorithms could be expressed as an optimization problem over two information-theoretic quantities: the mutual information of the data with the learned representations and the entropy of the learned distribution. In particular, such a formulation was offered in order to explain the success of the most prominent FDL algorithm, sparse filtering (SF). This conjecture was, however, left unproven. In this work, we aim at providing preliminary empirical support to this conjecture by performing experiments reminiscent of the work done on deep neural networks in the context of the IB research. Specifically, we borrow the idea of using information planes to analyze the behavior of the SF algorithm and gain insights on its dynamics. A confirmation of the conjecture about the dynamics of FDL may provide solid ground to develop information-theoretic tools to assess the quality of the learning process in FDL, and it may be extended to other unsupervised learning algorithms.