Goto

Collaborating Authors

 Bayesian Inference


Probabilistic Artificial Intelligence

arXiv.org Artificial Intelligence

Artificial intelligence commonly refers to the science and engineering of artificial systems that can carry out tasks generally associated with requiring aspects of human intelligence, such as playing games, translating languages, and driving cars. In recent years, there have been exciting advances in learning-based, data-driven approaches towards AI, and machine learning and deep learning have enabled computer systems to perceive the world in unprecedented ways. Reinforcement learning has enabled breakthroughs in complex games such as Go and challenging robotics tasks such as quadrupedal locomotion. A key aspect of intelligence is to not only make predictions, but reason about the uncertainty in these predictions, and to consider this uncertainty when making decisions. This is what this manuscript on "Probabilistic Artificial Intelligence" is about. The first part covers probabilistic approaches to machine learning. We discuss the differentiation between "epistemic" uncertainty due to lack of data and "aleatoric" uncertainty, which is irreducible and stems, e.g., from noisy observations and outcomes. We discuss concrete approaches towards probabilistic inference and modern approaches to efficient approximate inference. The second part of the manuscript is about taking uncertainty into account in sequential decision tasks. We consider active learning and Bayesian optimization -- approaches that collect data by proposing experiments that are informative for reducing the epistemic uncertainty. We then consider reinforcement learning and modern deep RL approaches that use neural network function approximation. We close by discussing modern approaches in model-based RL, which harness epistemic and aleatoric uncertainty to guide exploration, while also reasoning about safety.


In-context denoising with one-layer transformers: connections between attention and associative memory retrieval

arXiv.org Artificial Intelligence

We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative memory and attention mechanisms first identified by Ramsauer et al., and demonstrates the relevance of associative memory models in the study of in-context learning.


Decentralized Online Ensembles of Gaussian Processes for Multi-Agent Systems

arXiv.org Machine Learning

Flexible and scalable decentralized learning solutions are fundamentally important in the application of multi-agent systems. While several recent approaches introduce (ensembles of) kernel machines in the distributed setting, Bayesian solutions are much more limited. We introduce a fully decentralized, asymptotically exact solution to computing the random feature approximation of Gaussian processes. We further address the choice of hyperparameters by introducing an ensembling scheme for Bayesian multiple kernel learning based on online Bayesian model averaging. The resulting algorithm is tested against Bayesian and frequentist methods on simulated and real-world datasets.


Time Series Analysis of Rankings: A GARCH-Type Approach

arXiv.org Machine Learning

Ranking data are frequently obtained nowadays but there are still scarce methods for treating these data when temporally observed. The present paper contributes to this topic by proposing and developing novel models for handling time series of ranking data. We introduce a class of time-varying ranking models inspired by the Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) models. More specifically, the temporal dynamics are defined by the conditional distribution of the current ranking given the past rankings, which are assumed to follow a Mallows distribution, which implicitly depends on a distance. Then, autoregressive and feedback components are incorporated into the model through the conditional expectation of the associated distances. Theoretical properties of our ranking GARCH models such as stationarity and ergodicity are established. The estimation of parameters is performed via maximum likelihood estimation when data is fully observed. We develop a Monte Carlo Expectation-Maximisation algorithm to deal with cases involving missing data. Monte Carlo simulation studies are presented to study the performance of the proposed estimators under both non-missing and missing data scenarios. A real data application about the weekly ranking of professional tennis players from 2015 to 2019 is presented under our proposed ranking GARCH models.


Cyclic quantum causal modelling with a graph separation theorem

arXiv.org Machine Learning

Causal modelling frameworks link observable correlations to causal explanations, which is a crucial aspect of science. These models represent causal relationships through directed graphs, with vertices and edges denoting systems and transformations within a theory. Most studies focus on acyclic causal graphs, where well-defined probability rules and powerful graph-theoretic properties like the d-separation theorem apply. However, understanding complex feedback processes and exotic fundamental scenarios with causal loops requires cyclic causal models, where such results do not generally hold. While progress has been made in classical cyclic causal models, challenges remain in uniquely fixing probability distributions and identifying graph-separation properties applicable in general cyclic models. In cyclic quantum scenarios, existing frameworks have focussed on a subset of possible cyclic causal scenarios, with graph-separation properties yet unexplored. This work proposes a framework applicable to all consistent quantum and classical cyclic causal models on finite-dimensional systems. We address these challenges by introducing a robust probability rule and a novel graph-separation property, p-separation, which we prove to be sound and complete for all such models. Our approach maps cyclic causal models to acyclic ones with post-selection, leveraging the post-selected quantum teleportation protocol. We characterize these protocols and their success probabilities along the way. We also establish connections between this formalism and other classical and quantum frameworks to inform a more unified perspective on causality. This provides a foundation for more general cyclic causal discovery algorithms and to systematically extend open problems and techniques from acyclic informational networks (e.g., certification of non-classicality) to cyclic causal structures and networks.


Efficient distributional regression trees learning algorithms for calibrated non-parametric probabilistic forecasts

arXiv.org Artificial Intelligence

The perspective of developing trustworthy AI for critical applications in science and engineering requires machine learning techniques that are capable of estimating their own uncertainty. In the context of regression, instead of estimating a conditional mean, this can be achieved by producing a predictive interval for the output, or to even learn a model of the conditional probability $p(y|x)$ of an output $y$ given input features $x$. While this can be done under parametric assumptions with, e.g. generalized linear model, these are typically too strong, and non-parametric models offer flexible alternatives. In particular, for scalar outputs, learning directly a model of the conditional cumulative distribution function of $y$ given $x$ can lead to more precise probabilistic estimates, and the use of proper scoring rules such as the weighted interval score (WIS) and the continuous ranked probability score (CRPS) lead to better coverage and calibration properties. This paper introduces novel algorithms for learning probabilistic regression trees for the WIS or CRPS loss functions. These algorithms are made computationally efficient thanks to an appropriate use of known data structures - namely min-max heaps, weight-balanced binary trees and Fenwick trees. Through numerical experiments, we demonstrate that the performance of our methods is competitive with alternative approaches. Additionally, our methods benefit from the inherent interpretability and explainability of trees. As a by-product, we show how our trees can be used in the context of conformal prediction and explain why they are particularly well-suited for achieving group-conditional coverage guarantees.


Does Unsupervised Domain Adaptation Improve the Robustness of Amortized Bayesian Inference? A Systematic Evaluation

arXiv.org Machine Learning

Neural networks are fragile when confronted with data that significantly deviates from their training distribution. This is true in particular for simulation-based inference methods, such as neural amortized Bayesian inference (ABI), where models trained on simulated data are deployed on noisy real-world observations. Recent robust approaches employ unsupervised domain adaptation (UDA) to match the embedding spaces of simulated and observed data. However, the lack of comprehensive evaluations across different domain mismatches raises concerns about the reliability in high-stakes applications. We address this gap by systematically testing UDA approaches across a wide range of misspecification scenarios in both a controlled and a high-dimensional benchmark. We demonstrate that aligning summary spaces between domains effectively mitigates the impact of unmodeled phenomena or noise. However, the same alignment mechanism can lead to failures under prior misspecifications - a critical finding with practical consequences. Our results underscore the need for careful consideration of misspecification types when using UDA techniques to increase the robustness of ABI in practice.


PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders

arXiv.org Machine Learning

Learning informative representations of phylogenetic tree structures is essential for analyzing evolutionary relationships. Classical distance-based methods have been widely used to project phylogenetic trees into Euclidean space, but they are often sensitive to the choice of distance metric and may lack sufficient resolution. In this paper, we introduce phylogenetic variational autoencoders (PhyloVAEs), an unsupervised learning framework designed for representation learning and generative modeling of tree topologies. Leveraging an efficient encoding mechanism inspired by autoregressive tree topology generation, we develop a deep latent-variable generative model that facilitates fast, parallelized topology generation. Phylo-VAE combines this generative model with a collaborative inference model based on learnable topological features, allowing for high-resolution representations of phylogenetic tree samples. Extensive experiments demonstrate PhyloVAE's robust representation learning capabilities and fast generation of phylogenetic tree topologies. Phylogenetic trees are the foundational structure for describing the evolutionary processes among individuals or groups of biological entities. Reconstructing these trees based on collected biological sequences (e.g., DNA, RNA, protein) from observed species, also known as phylogenetic inference (Felsenstein, 2004), is an essential discipline of computational biology (Fitch, 1971; Felsenstein, 1981; Yang & Rannala, 1997; Ronquist et al., 2012). Large collections of trees obtained from these approaches (e.g., posterior samples from MCMC runs (Ronquist et al., 2012)), however, are often difficult to summarize or visualize due to the discrete and non-Euclidean nature of the tree topology space The classical approach to visualize and analyze distributions of phylogenetic trees is to calculate pairwise distances between the trees and project them into a plane using multidimensional scaling (MDS) (Amenta & Klingner, 2002; Hillis et al., 2005; Jombart et al., 2017). However, these approaches have the shortcoming that one can not map an arbitrary point in the visualization to a tree, and therefore do not form an actual visualization of the relevant tree space.


Active Learning of Model Discrepancy with Bayesian Experimental Design

arXiv.org Artificial Intelligence

Digital twins have been actively explored in many engineering applications, such as manufacturing and autonomous systems. However, model discrepancy is ubiquitous in most digital twin models and has significant impacts on the performance of using those models. In recent years, data-driven modeling techniques have been demonstrated promising in characterizing the model discrepancy in existing models, while the training data for the learning of model discrepancy is often obtained in an empirical way and an active approach of gathering informative data can potentially benefit the learning of model discrepancy. On the other hand, Bayesian experimental design (BED) provides a systematic approach to gathering the most informative data, but its performance is often negatively impacted by the model discrepancy. In this work, we build on sequential BED and propose an efficient approach to iteratively learn the model discrepancy based on the data from the BED. The performance of the proposed method is validated by a classical numerical example governed by a convection-diffusion equation, for which full BED is still feasible. The proposed method is then further studied in the same numerical example with a high-dimensional model discrepancy, which serves as a demonstration for the scenarios where full BED is not practical anymore. An ensemble-based approximation of information gain is further utilized to assess the data informativeness and to enhance learning model discrepancy. The results show that the proposed method is efficient and robust to the active learning of high-dimensional model discrepancy, using data suggested by the sequential BED. We also demonstrate that the proposed method is compatible with both classical numerical solvers and modern auto-differentiable solvers.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Rebuttal: thank you for your clarifications. I still think that learning kernel (parameters) from multiple realizations of a GP is not very novel in general, but sufficiently novel in your specific context to get discussed at NIPS. The authors use Gaussian processes to learn human function extrapolation behaviour from human sample data. After a comprehensive literature review, they introduce the main idea of the paper: learn the kernel parameters by maximizing the conditional probability of the extrapolation data given the training data. To allow for flexible kernel shapes, they use spectral mixture kernels.