Bayesian Inference
Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors
Nagler, Thomas, Rügamer, David
Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient sampling procedure to construct Bayesian posteriors for such estimates based on Martingale posteriors, and prove its convergence. Several simulated and real-world data examples showcase the uncertainty quantification of our method in inference applications.
A simple estimator of the correlation kernel matrix of a determinantal point process
Gouriéroux, Christian, Lu, Yang
Determinantal Point Process (DPP) is a flexible family of distributions for random sets defined on the finite state space { 1, ...,d }, or equivalently for multivariate binary variables. This family is parameterized by either the L-ensemble kernel Σ, which is symmetric positive definite (SPD), or the correlation kernel matrix K, which is SPD, with eigenvalues lying strictly between 0 and 1. The literature has considered the maximum likelihood estimation (MLE) of Σ and K or its algorithmic analogues (Affandi et al., 2014; Brunel et al., 2017a,b), but it has since been shown that i) the likelihood function has at least 2
Understanding Task Representations in Neural Networks via Bayesian Ablation
Nam, Andrew, Campbell, Declan, Griffiths, Thomas, Cohen, Jonathan, Leslie, Sarah-Jane
Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.
The Gaussian Latent Machine: Efficient Prior and Posterior Sampling for Inverse Problems
Kuric, Muhamed, Zach, Martin, Habring, Andreas, Unser, Michael, Pock, Thomas
We consider the problem of sampling from a product-of-experts-type model that encompasses many standard prior and posterior distributions commonly found in Bayesian imaging. We show that this model can be easily lifted into a novel latent variable model, which we refer to as a Gaussian latent machine. This leads to a general sampling approach that unifies and generalizes many existing sampling algorithms in the literature. Most notably, it yields a highly efficient and effective two-block Gibbs sampling approach in the general case, while also specializing to direct sampling algorithms in particular cases. Finally, we present detailed numerical experiments that demonstrate the efficiency and effectiveness of our proposed sampling approach across a wide range of prior and posterior sampling problems from Bayesian imaging.
Rapidly Varying Completely Random Measures for Modeling Extremely Sparse Networks
Kilian, Valentin, Guedj, Benjamin, Caron, François
Completely random measures (CRMs) are fundamental to Bayesian nonparametric models, with applications in clustering, feature allocation, and network analysis. A key quantity of interest is the Laplace exponent, whose asymptotic behavior determines how the random structures scale. When the Laplace exponent grows nearly linearly - known as rapid variation - the induced models exhibit approximately linear growth in the number of clusters, features, or edges with sample size or network nodes. This regime is especially relevant for modeling sparse networks, yet existing CRM constructions lack tractability under rapid variation. We address this by introducing a new class of CRMs with index of variation $α\in(0,1]$, defined as mixtures of stable or generalized gamma processes. These models offer interpretable parameters, include well-known CRMs as limiting cases, and retain analytical tractability through a tractable Laplace exponent and simple size-biased representation. We analyze the asymptotic properties of this CRM class and apply it to the Caron-Fox framework for sparse graphs. The resulting models produce networks with near-linear edge growth, aligning with empirical evidence from large-scale networks. Additionally, we present efficient algorithms for simulation and posterior inference, demonstrating practical advantages through experiments on real-world sparse network datasets.
Private Statistical Estimation via Truncation
Zampetakis, Manolis, Zhou, Felix
We introduce a novel framework for differentially private (DP) statistical estimation via data truncation, addressing a key challenge in DP estimation when the data support is unbounded. Traditional approaches rely on problem-specific sensitivity analysis, limiting their applicability. By leveraging techniques from truncated statistics, we develop computationally efficient DP estimators for exponential family distributions, including Gaussian mean and covariance estimation, achieving near-optimal sample complexity. Previous works on exponential families only consider bounded or one-dimensional families. Our approach mitigates sensitivity through truncation while carefully correcting for the introduced bias using maximum likelihood estimation and DP stochastic gradient descent. Along the way, we establish improved uniform convergence guarantees for the log-likelihood function of exponential families, which may be of independent interest. Our results provide a general blueprint for DP algorithm design via truncated statistics.
Theory: Multidimensional Space of Events
This paper extends Bayesian probability theory by developing a multidimensional space of events (MDSE) theory that accounts for mutual influences between events and hypotheses sets. While traditional Bayesian approaches assume conditional independence between certain variables, real-world systems often exhibit complex interdependencies that limit classical model applicability. Building on established probabilistic foundations, our approach introduces a mathematical formalism for modeling these complex relationships. We developed the MDSE theory through rigorous mathematical derivation and validated it using three complementary methodologies: analytical proofs, computational simulations, and case studies drawn from diverse domains. Results demonstrate that MDSE successfully models complex dependencies with 15-20% improved prediction accuracy compared to standard Bayesian methods when applied to datasets with high interdimensionality. This theory particularly excels in scenarios with over 50 interrelated variables, where traditional methods show exponential computational complexity growth while MDSE maintains polynomial scaling. Our findings indicate that MDSE provides a viable mathematical foundation for extending Bayesian reasoning to complex systems while maintaining computational tractability. This approach offers practical applications in engineering challenges including risk assessment, resource optimization, and forecasting problems where multiple interdependent factors must be simultaneously considered.
Humble your Overconfident Networks: Unlearning Overfitting via Sequential Monte Carlo Tempered Deep Ensembles
Millard, Andrew, Zhao, Zheng, Murphy, Joshua, Maskell, Simon
Sequential Monte Carlo (SMC) methods offer a principled approach to Bayesian uncertainty quantification but are traditionally limited by the need for full-batch gradient evaluations. We introduce a scalable variant by incorporating Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) proposals into SMC, enabling efficient mini-batch based sampling. Our resulting SMCSGHMC algorithm outperforms standard stochastic gradient descent (SGD) and deep ensembles across image classification, out-of-distribution (OOD) detection, and transfer learning tasks. We further show that SMCSGHMC mitigates overfitting and improves calibration, providing a flexible, scalable pathway for converting pretrained neural networks into well-calibrated Bayesian models.
Wasserstein Barycenter Gaussian Process based Bayesian Optimization
Candelieri, Antonio, Ponti, Andrea, Archetti, Francesco
Gaussian Process based Bayesian Optimization is a widely applied algorithm to learn and optimize under uncertainty, well-known for its sample efficiency. However, recently -- and more frequently -- research studies have empirically demonstrated that the Gaussian Process fitting procedure at its core could be its most relevant weakness. Fitting a Gaussian Process means tuning its kernel's hyperparameters to a set of observations, but the common Maximum Likelihood Estimation technique, usually appropriate for learning tasks, has shown different criticalities in Bayesian Optimization, making theoretical analysis of this algorithm an open challenge. Exploiting the analogy between Gaussian Processes and Gaussian Distributions, we present a new approach which uses a prefixed set of hyperparameters values to fit as many Gaussian Processes and then combines them into a unique model as a Wasserstein Barycenter of Gaussian Processes. We considered both "easy" test problems and others known to undermine the \textit{vanilla} Bayesian Optimization algorithm. The new method, namely Wasserstein Barycenter Gausssian Process based Bayesian Optimization (WBGP-BO), resulted promising and able to converge to the optimum, contrary to vanilla Bayesian Optimization, also on the most "tricky" test problems.
Attribution Projection Calculus: A Novel Framework for Causal Inference in Bayesian Networks
This paper introduces Attribution Projection Calculus (AP-Calculus), a novel mathematical framework for determining causal relationships in structured Bayesian networks. We investigate a specific network architecture with source nodes connected to destination nodes through intermediate nodes, where each input maps to a single label with maximum marginal probability. We prove that for each label, exactly one intermediate node acts as a deconfounder while others serve as confounders, enabling optimal attribution of features to their corresponding labels. The framework formalizes the dual nature of intermediate nodes as both confounders and deconfounders depending on the context, and establishes separation functions that maximize distinctions between intermediate representations. We demonstrate that the proposed network architecture is optimal for causal inference compared to alternative structures, including those based on Pearl's causal framework. AP-Calculus provides a comprehensive mathematical foundation for analyzing feature-label attributions, managing spurious correlations, quantifying information gain, ensuring fairness, and evaluating uncertainty in prediction models, including large language models. Theoretical verification shows that AP-Calculus not only extends but can also subsume traditional do-calculus for many practical applications, offering a more direct approach to causal inference in supervised learning contexts.