Bayesian Learning
A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives
Chazal, Clรฉmentine, Kanagawa, Heishiro, Shen, Zheyang, Korba, Anna, Oates, Chris. J.
Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. This increased flexibility introduces a computational challenge, as one loses access to an explicit unnormalised density for the target. To mitigate this difficulty, we introduce a novel measure of suboptimality called 'gradient discrepancy', and in particular a 'kernel gradient discrepancy' (KGD) that can be explicitly computed. In the standard Bayesian context, KGD coincides with the kernel Stein discrepancy (KSD), and we obtain a novel charasterisation of KSD as measuring the size of a variational gradient. Outside this familiar setting, KGD enables novel sampling algorithms to be developed and compared, even when unnormalised densities cannot be obtained. To illustrate this point several novel algorithms are proposed, including a natural generalisation of Stein variational gradient descent, with applications to mean-field neural networks and prediction-centric uncertainty quantification presented. On the theoretical side, our principal contribution is to establish sufficient conditions for desirable properties of KGD, such as continuity and convergence control.
An Interval Type-2 Version of Bayes Theorem Derived from Interval Probability Range Estimates Provided by Subject Matter Experts
Rickard, John T., Dembski, William A., Rickards, James
Bayesian inference is widely used in many different fields to test hypotheses against observations. In most such applications, an assumption is made of precise input values to produce a precise output value. However, this is unrealistic for real-world applications. Often the best available information from subject matter experts (SMEs) in a given field is interval range estimates of the input probabilities involved in Bayes Theorem. This paper provides two key contributions to extend Bayes Theorem to an interval type-2 (IT2) version. First, we develop an IT2 version of Bayes Theorem that uses a novel and conservative method to avoid potential inconsistencies in the input IT2 MFs that otherwise might produce invalid output results. We then describe a novel and flexible algorithm for encoding SME-provided intervals into IT2 fuzzy membership functions (MFs), which we can use to specify the input probabilities in Bayes Theorem. Our algorithm generalizes and extends previous work on this problem that primarily addressed the encoding of intervals into word MFs for Computing with Words applications.
Uncertainty Estimation by Human Perception versus Neural Models
Mendes, Pedro, Romano, Paolo, Garlan, David
Modern neural networks (NNs) often achieve high predictive accuracy but are poorly calibrated, producing overconfident predictions even when wrong. This miscalibration poses serious challenges in applications where reliable uncertainty estimates are critical. In this work, we investigate how human perceptual uncertainty compares to uncertainty estimated by NNs. Using three vision benchmarks annotated with both human disagreement and crowdsourced confidence, we assess the correlation between model-predicted uncertainty and human-perceived uncertainty. Our results show that current methods only weakly align with human intuition, with correlations varying significantly across tasks and uncertainty metrics. Notably, we find that incorporating human-derived soft labels into the training process can improve calibration without compromising accuracy. These findings reveal a persistent gap between model and human uncertainty and highlight the potential of leveraging human insights to guide the development of more trustworthy AI systems.
A Comprehensive Guide to Differential Privacy: From Theory to User Expectations
Karmitsa, Napsu, Airola, Antti, Pahikkala, Tapio, Pitkรคmรคki, Tinja
The increasing availability of personal data has enabled significant advances in fields such as machine learning, healthcare, and cybersecurity. However, this data abundance also raises serious privacy concerns, especially in light of powerful re-identification attacks and growing legal and ethical demands for responsible data use. Differential privacy (DP) has emerged as a principled, mathematically grounded framework for mitigating these risks. This review provides a comprehensive survey of DP, covering its theoretical foundations, practical mechanisms, and real-world applications. It explores key algorithmic tools and domain-specific challenges - particularly in privacy-preserving machine learning and synthetic data generation. The report also highlights usability issues and the need for improved communication and transparency in DP systems. Overall, the goal is to support informed adoption of DP by researchers and practitioners navigating the evolving landscape of data privacy.
RESPLE: Recursive Spline Estimation for LiDAR-Based Odometry
Cao, Ziyu, Talbot, William, Li, Kailai
We present a novel recursive Bayesian estimation framework using B-splines for continuous-time 6-DoF dynamic motion estimation. The state vector consists of a recurrent set of position control points and orientation control point increments, enabling efficient estimation via a modified iterated extended Kalman filter without involving error-state formulations. The resulting recursive spline estimator (RESPLE) is further leveraged to develop a versatile suite of direct LiDAR-based odometry solutions, supporting the integration of one or multiple LiDARs and an IMU. We conduct extensive real-world evaluations using public datasets and our own experiments, covering diverse sensor setups, platforms, and environments. Compared to existing systems, RESPLE achieves comparable or superior estimation accuracy and robustness, while attaining real-time efficiency. Our results and analysis demonstrate RESPLE's strength in handling highly dynamic motions and complex scenes within a lightweight and flexible design, showing strong potential as a universal framework for multi-sensor motion estimation. We release the source code and experimental datasets at https://github.com/ASIG-X/RESPLE .
A Minimalist Bayesian Framework for Stochastic Optimization
The Bayesian paradigm offers principled tools for sequential decision-making under uncertainty, but its reliance on a probabilistic model for all parameters can hinder the incorporation of complex structural constraints. We introduce a minimalist Bayesian framework that places a prior only on the component of interest, such as the location of the optimum. Nuisance parameters are eliminated via profile likelihood, which naturally handles constraints. As a direct instantiation, we develop a MINimalist Thompson Sampling (MINTS) algorithm. Our framework accommodates structured problems, including continuum-armed Lipschitz bandits and dynamic pricing. It also provides a probabilistic lens on classical convex optimization algorithms such as the center of gravity and ellipsoid methods. We further analyze MINTS for multi-armed bandits and establish near-optimal regret guarantees.
Selective Induction Heads: How Transformers Select Causal Structures In Context
D'Angelo, Francesco, Croce, Francesco, Flammarion, Nicolas
Transformers have exhibited exceptional capabilities in sequence modeling tasks, leveraging self-attention and in-context learning. Critical to this success are induction heads, attention circuits that enable copying tokens based on their previous occurrences. In this work, we introduce a novel framework that showcases transformers' ability to dynamically handle causal structures. Existing works rely on Markov Chains to study the formation of induction heads, revealing how transformers capture causal dependencies and learn transition probabilities in-context. However, they rely on a fixed causal structure that fails to capture the complexity of natural languages, where the relationship between tokens dynamically changes with context. To this end, our framework varies the causal structure through interleaved Markov chains with different lags while keeping the transition probabilities fixed. This setting unveils the formation of Selective Induction Heads, a new circuit that endows transformers with the ability to select the correct causal structure in-context. We empirically demonstrate that transformers learn this mechanism to predict the next token by identifying the correct lag and copying the corresponding token from the past. We provide a detailed construction of a 3-layer transformer to implement the selective induction head, and a theoretical analysis proving that this mechanism asymptotically converges to the maximum likelihood solution. Our findings advance the understanding of how transformers select causal structures, providing new insights into their functioning and interpretability.
uGMM-NN: Univariate Gaussian Mixture Model Neural Network
Deep neural networks have transformed machine learning, excelling in tasks such as image classification and natural language processing through hierarchical feature learning [1]. However, traditional neurons, which compute deterministic weighted sums followed by nonlinear activations (e.g., ReLU, sigmoid), struggle to model uncertainty or multimodal distributions prevalent in real-world data. This limitation has historically been addressed by probabilistic graphical models, such as Bayesian Networks [2] and Markov Random Fields [3], which offer robust frameworks for uncertainty quantification and complex dependency modeling [4]. These models provide a strong conceptual foundation, but often lack the deep hierarchical feature learning capabilities of modern neural networks. A key research focus has therefore been on bridging the gap between these two paradigms. This has led to approaches that incorporate the probabilistic principles of graphical models directly into deep learning architectures. For example, Bayesian Neural Networks (BNNs) embed uncertainty into network weights [5], while Probabilistic Circuits (PCs), including Sum-Product Networks (SPNs) [7, 8, 9], are deep probabilistic models that build on a formal probabilistic structure, fusing the representational power of graphical models with the hierarchical feature learning of neural networks. In contrast, this paper introduces a novel approach by embedding a univariate Gaussian Mixture Model (uGMM) directly into the network's computational units, enabling each neuron to represent 1
Bayesian Pliable Lasso with Horseshoe Prior for Interaction Effects in GLMs with Missing Responses
Sparse regression problems, where the goal is to identify a small set of relevant predictors, often require modeling not only main effects but also meaningful interactions through other variables. While the pliable lasso has emerged as a powerful frequentist tool for modeling such interactions under strong heredity constraints, it lacks a natural framework for uncertainty quantification and incorporation of prior knowledge. In this paper, we propose a Bayesian pliable lasso that extends this approach by placing sparsity-inducing priors, such as the horseshoe, on both main and interaction effects. The hierarchical prior structure enforces heredity constraints while adaptively shrinking irrelevant coefficients and allowing important effects to persist. We extend this framework to Generalized Linear Models (GLMs) and develop a tailored approach to handle missing responses. To facilitate posterior inference, we develop an efficient Gibbs sampling algorithm based on a reparameterization of the horseshoe prior. Our Bayesian framework yields sparse, interpretable interaction structures, and principled measures of uncertainty. Through simulations and real-data studies, we demonstrate its advantages over existing methods in recovering complex interaction patterns under both complete and incomplete data. Our method is implemented in the package \texttt{hspliable} available on Github.
Nuclear Data Adjustment for Nonlinear Applications in the OECD/NEA WPNCS SG14 Benchmark -- A Bayesian Inverse UQ-based Approach for Data Assimilation
The Organization for Economic Cooperation and Development (OECD) Working Party on Nuclear Criticality Safety (WPNCS) proposed a benchmark exercise to assess the performance of current nuclear data adjustment techniques applied to nonlinear applications and experiments with low correlation to applications. This work introduces Bayesian Inverse Uncertainty Quantification (IUQ) as a method for nuclear data adjustments in this benchmark, and compares IUQ to the more traditional methods of Generalized Linear Least Squares (GLLS) and Monte Carlo Bayes (MOCABA). Posterior predictions from IUQ showed agreement with GLLS and MOCABA for linear applications. When comparing GLLS, MOCABA, and IUQ posterior predictions to computed model responses using adjusted parameters, we observe that GLLS predictions fail to replicate computed response distributions for nonlinear applications, while MOCABA shows near agreement, and IUQ uses computed model responses directly. We also discuss observations on why experiments with low correlation to applications can be informative to nuclear data adjustments and identify some properties useful in selecting experiments for inclusion in nuclear data adjustment. Performance in this benchmark indicates potential for Bayesian IUQ in nuclear data adjustments.