Goto

Collaborating Authors

 Paoletti, Nicola


Robust Counterfactual Inference in Markov Decision Processes

arXiv.org Artificial Intelligence

This paper addresses a key limitation in existing counterfactual inference methods for Markov Decision Processes (MDPs). Current approaches assume a specific causal model to make counterfactuals identifiable. However, there are usually many causal models that align with the observational and interventional distributions of an MDP, each yielding different counterfactual distributions, so fixing a particular causal model limits the validity (and usefulness) of counterfactual inference. W e propose a novel non-parametric approach that computes tight bounds on counterfactual transition probabilities across all compatible causal models. Unlike previous methods that require solving prohibitively large optimisation problems (with variables that grow exponentially in the size of the MDP), our approach provides closed-form expressions for these bounds, making computation highly efficient and scalable for non-trivial MDPs. Once such an interval counterfactual MDP is constructed, our method identifies robust counterfactual policies that optimise the worst-case reward w.r.t. the uncertain interval MDP probabilities. W e evaluate our method on various case studies, demonstrating improved robustness over existing methods.


Certified Guidance for Planning with Deep Generative Models

arXiv.org Machine Learning

Deep generative models, such as generative adversarial networks and diffusion models, have recently emerged as powerful tools for planning tasks and behavior synthesis in autonomous systems. Various guidance strategies have been introduced to steer the generative process toward outputs that are more likely to satisfy the planning objectives. These strategies avoid the need for model retraining but do not provide any guarantee that the generated outputs will satisfy the desired planning objectives. To address this limitation, we introduce certified guidance, an approach that modifies a generative model, without retraining it, into a new model guaranteed to satisfy a given specification with probability one. We focus on Signal Temporal Logic specifications, which are rich enough to describe nontrivial planning tasks. Our approach leverages neural network verification techniques to systematically explore the latent spaces of the generative models, identifying latent regions that are certifiably correct with respect to the STL property of interest. We evaluate the effectiveness of our method on four planning benchmarks using GANs and diffusion models. Our results confirm that certified guidance produces generative models that are always correct, unlike existing guidance methods that are not certified.


Distilling Calibration via Conformalized Credal Inference

arXiv.org Artificial Intelligence

Deploying artificial intelligence (AI) models on edge devices involves a delicate balance between meeting stringent complexity constraints, such as limited memory and energy resources, and ensuring reliable performance in sensitive decision-making tasks. One way to enhance reliability is through uncertainty quantification via Bayesian inference. This approach, however, typically necessitates maintaining and running multiple models in an ensemble, which may exceed the computational limits of edge devices. This paper introduces a low-complexity methodology to address this challenge by distilling calibration information from a more complex model. In an offline phase, predictive probabilities generated by a high-complexity cloud-based model are leveraged to determine a threshold based on the typical divergence between the cloud and edge models. At run time, this threshold is used to construct credal sets -- ranges of predictive probabilities that are guaranteed, with a user-selected confidence level, to include the predictions of the cloud model. The credal sets are obtained through thresholding of a divergence measure in the simplex of predictive probabilities. Experiments on visual and language tasks demonstrate that the proposed approach, termed Conformalized Distillation for Credal Inference (CD-CI), significantly improves calibration performance compared to low-complexity Bayesian methods, such as Laplace approximation, making it a practical and efficient solution for edge AI deployments.


Verifiably Robust Conformal Prediction

arXiv.org Artificial Intelligence

Conformal Prediction (CP) is a popular uncertainty quantification method that provides distribution-free, statistically valid prediction sets, assuming that training and test data are exchangeable. In such a case, CP's prediction sets are guaranteed to cover the (unknown) true test output with a user-specified probability. Nevertheless, this guarantee is violated when the data is subjected to adversarial attacks, which often result in a significant loss of coverage. Recently, several approaches have been put forward to recover CP guarantees in this setting. These approaches leverage variations of randomised smoothing to produce conservative sets which account for the effect of the adversarial perturbations. They are, however, limited in that they only support $\ell^2$-bounded perturbations and classification tasks. This paper introduces VRCP (Verifiably Robust Conformal Prediction), a new framework that leverages recent neural network verification methods to recover coverage guarantees under adversarial attacks. Our VRCP method is the first to support perturbations bounded by arbitrary norms including $\ell^1$, $\ell^2$, and $\ell^\infty$, as well as regression tasks. We evaluate and compare our approach on image classification tasks (CIFAR10, CIFAR100, and TinyImageNet) and regression tasks for deep reinforcement learning environments. In every case, VRCP achieves above nominal coverage and yields significantly more efficient and informative prediction regions than the SotA.


Conformal Off-Policy Prediction for Multi-Agent Systems

arXiv.org Machine Learning

Off-Policy Prediction (OPP), i.e., predicting the outcomes of a target policy using only data collected under a nominal (behavioural) policy, is a paramount problem in data-driven analysis of safety-critical systems where the deployment of a new policy may be unsafe. To achieve dependable off-policy predictions, recent work on Conformal Off-Policy Prediction (COPP) leverage the conformal prediction framework to derive prediction regions with probabilistic guarantees under the target process. Existing COPP methods can account for the distribution shifts induced by policy switching, but are limited to single-agent systems and scalar outcomes (e.g., rewards). In this work, we introduce MA-COPP, the first conformal prediction method to solve OPP problems involving multi-agent systems, deriving joint prediction regions for all agents' trajectories when one or more "ego" agents change their policies. Unlike the single-agent scenario, this setting introduces higher complexity as the distribution shifts affect predictions for all agents, not just the ego agents, and the prediction task involves full multi-dimensional trajectories, not just reward values. A key contribution of MA-COPP is to avoid enumeration or exhaustive search of the output space of agent trajectories, which is instead required by existing COPP methods to construct the prediction region. We achieve this by showing that an over-approximation of the true JPR can be constructed, without enumeration, from the maximum density ratio of the JPR trajectories. We evaluate the effectiveness of MA-COPP in multi-agent systems from the PettingZoo library and the F1TENTH autonomous racing environment, achieving nominal coverage in higher dimensions and various shift settings.


Counterfactual Influence in Markov Decision Processes

arXiv.org Artificial Intelligence

Our work addresses a fundamental problem in the context of counterfactual inference for Markov Decision Processes (MDPs). Given an MDP path $\tau$, this kind of inference allows us to derive counterfactual paths $\tau'$ describing what-if versions of $\tau$ obtained under different action sequences than those observed in $\tau$. However, as the counterfactual states and actions deviate from the observed ones over time, the observation $\tau$ may no longer influence the counterfactual world, meaning that the analysis is no longer tailored to the individual observation, resulting in interventional outcomes rather than counterfactual ones. Even though this issue specifically affects the popular Gumbel-max structural causal model used for MDP counterfactuals, it has remained overlooked until now. In this work, we introduce a formal characterisation of influence based on comparing counterfactual and interventional distributions. We devise an algorithm to construct counterfactual models that automatically satisfy influence constraints. Leveraging such models, we derive counterfactual policies that are not just optimal for a given reward structure but also remain tailored to the observed path. Even though there is an unavoidable trade-off between policy optimality and strength of influence constraints, our experiments demonstrate that it is possible to derive (near-)optimal policies while remaining under the influence of the observation.


Learning-Based Approaches to Predictive Monitoring with Conformal Statistical Guarantees

arXiv.org Artificial Intelligence

This tutorial focuses on efficient methods to predictive monitoring (PM), the problem of detecting at runtime future violations of a given requirement from the current state of a system. While performing model checking at runtime would offer a precise solution to the PM problem, it is generally computationally expensive. To address this scalability issue, several lightweight approaches based on machine learning have recently been proposed. These approaches work by learning an approximate yet efficient surrogate (deep learning) model of the expensive model checker. A key challenge remains to ensure reliable predictions, especially in safety-critical applications. We review our recent work on predictive monitoring, one of the first to propose learning-based approximations for CPS verification of temporal logic specifications and the first in this context to apply conformal prediction (CP) for rigorous uncertainty quantification. These CP-based uncertainty estimators offer statistical guarantees regarding the generalization error of the learning model, and they can be used to determine unreliable predictions that should be rejected. In this tutorial, we present a general and comprehensive framework summarizing our approach to the predictive monitoring of CPSs, examining in detail several variants determined by three main dimensions: system dynamics (deterministic, non-deterministic, stochastic), state observability, and semantics of requirements' satisfaction (Boolean or quantitative).


Probabilistic Reach-Avoid for Bayesian Neural Networks

arXiv.org Artificial Intelligence

Model-based reinforcement learning seeks to simultaneously learn the dynamics of an unknown stochastic environment and synthesise an optimal policy for acting in it. Ensuring the safety and robustness of sequential decisions made through a policy in such an environment is a key challenge for policies intended for safety-critical scenarios. In this work, we investigate two complementary problems: first, computing reach-avoid probabilities for iterative predictions made with dynamical models, with dynamics described by Bayesian neural network (BNN); second, synthesising control policies that are optimal with respect to a given reach-avoid specification (reaching a "target" state, while avoiding a set of "unsafe" states) and a learned BNN model. Our solution leverages interval propagation and backward recursion techniques to compute lower bounds for the probability that a policy's sequence of actions leads to satisfying the reach-avoid specification. Such computed lower bounds provide safety certification for the given policy and BNN model. We then introduce control synthesis algorithms to derive policies maximizing said lower bounds on the safety probability. We demonstrate the effectiveness of our method on a series of control benchmarks characterized by learned BNN dynamics models. On our most challenging benchmark, compared to purely data-driven policies the optimal synthesis algorithm is able to provide more than a four-fold increase in the number of certifiable states and more than a three-fold increase in the average guaranteed reach-avoid probability.


Causal Temporal Reasoning for Markov Decision Processes

arXiv.org Artificial Intelligence

We introduce $\textit{PCFTL (Probabilistic CounterFactual Temporal Logic)}$, a new probabilistic temporal logic for the verification of Markov Decision Processes (MDP). PCFTL is the first to include operators for causal reasoning, allowing us to express interventional and counterfactual queries. Given a path formula $\phi$, an interventional property is concerned with the satisfaction probability of $\phi$ if we apply a particular change $I$ to the MDP (e.g., switching to a different policy); a counterfactual allows us to compute, given an observed MDP path $\tau$, what the outcome of $\phi$ would have been had we applied $I$ in the past. For its ability to reason about \textit{what-if} scenarios involving different configurations of the MDP, our approach represents a departure from existing probabilistic temporal logics that can only reason about a fixed system configuration. From a syntactic viewpoint, we introduce a generalized counterfactual operator that subsumes both interventional and counterfactual probabilities as well as the traditional probabilistic operator found in e.g., PCTL. From a semantics viewpoint, our logic is interpreted over a structural causal model translation of the MDP, which gives us a representation amenable to counterfactual reasoning. We evaluate PCFTL in the context of safe reinforcement learning using a benchmark of grid-world models.


Conformal Quantitative Predictive Monitoring of STL Requirements for Stochastic Processes

arXiv.org Artificial Intelligence

We consider the problem of predictive monitoring (PM), i.e., predicting at runtime the satisfaction of a desired property from the current system's state. Due to its relevance for runtime safety assurance and online control, PM methods need to be efficient to enable timely interventions against predicted violations, while providing correctness guarantees. We introduce \textit{quantitative predictive monitoring (QPM)}, the first PM method to support stochastic processes and rich specifications given in Signal Temporal Logic (STL). Unlike most of the existing PM techniques that predict whether or not some property $\phi$ is satisfied, QPM provides a quantitative measure of satisfaction by predicting the quantitative (aka robust) STL semantics of $\phi$. QPM derives prediction intervals that are highly efficient to compute and with probabilistic guarantees, in that the intervals cover with arbitrary probability the STL robustness values relative to the stochastic evolution of the system. To do so, we take a machine-learning approach and leverage recent advances in conformal inference for quantile regression, thereby avoiding expensive Monte-Carlo simulations at runtime to estimate the intervals. We also show how our monitors can be combined in a compositional manner to handle composite formulas, without retraining the predictors nor sacrificing the guarantees. We demonstrate the effectiveness and scalability of QPM over a benchmark of four discrete-time stochastic processes with varying degrees of complexity.