Bayesian Inference
Multilevel neural simulation-based inference
Hikida, Yuga, Bharti, Ayush, Jeffrey, Niall, Briol, François-Xavier
Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing a simulator. However, the performance of neural SBI can suffer when simulators are computationally expensive, thereby limiting the number of simulations that can be performed. In this paper, we propose a novel approach to neural SBI which leverages multilevel Monte Carlo techniques for settings where several simulators of varying cost and fidelity are available. We demonstrate through both theoretical analysis and extensive experiments that our method can significantly enhance the accuracy of SBI methods given a fixed computational budget.
Variational Inference for Quantum HyperNetworks
Nepote, Luca, Lhéritier, Alix, Bondoux, Nicolas, Kountouris, Marios, Filippone, Maurizio
Binary Neural Networks (BiNNs), which employ single-bit precision weights, have emerged as a promising solution to reduce memory usage and power consumption while maintaining competitive performance in large-scale systems. However, training BiNNs remains a significant challenge due to the limitations of conventional training algorithms. Quantum HyperNetworks offer a novel paradigm for enhancing the optimization of BiNN by leveraging quantum computing. Specifically, a Variational Quantum Algorithm is employed to generate binary weights through quantum circuit measurements, while key quantum phenomena such as superposition and entanglement facilitate the exploration of a broader solution space. In this work, we establish a connection between this approach and Bayesian inference by deriving the Evidence Lower Bound (ELBO), when direct access to the output distribution is available (i.e., in simulations), and introducing a surrogate ELBO based on the Maximum Mean Discrepancy (MMD) metric for scenarios involving implicit distributions, as commonly encountered in practice. Our experimental results demonstrate that the proposed methods outperform standard Maximum Likelihood Estimation (MLE), improving trainability and generalization.
Reflect-then-Plan: Offline Model-Based Planning through a Doubly Bayesian Lens
Jeong, Jihwan, Wang, Xiaoyu, Wang, Jingmin, Sanner, Scott, Poupart, Pascal
Offline reinforcement learning (RL) is crucial when online exploration is costly or unsafe but often struggles with high epistemic uncertainty due to limited data. Existing methods rely on fixed conservative policies, restricting adaptivity and generalization. To address this, we propose Reflect-then-Plan (RefPlan), a novel doubly Bayesian offline model-based (MB) planning approach. RefPlan unifies uncertainty modeling and MB planning by recasting planning as Bayesian posterior estimation. At deployment, it updates a belief over environment dynamics using real-time observations, incorporating uncertainty into MB planning via marginalization. Empirical results on standard benchmarks show that RefPlan significantly improves the performance of conservative offline RL policies. In particular, RefPlan maintains robust performance under high epistemic uncertainty and limited data, while demonstrating resilience to changing environment dynamics, improving the flexibility, generalizability, and robustness of offline-learned policies.
Bayesian Inference for Correlated Human Experts and Classifiers
Kelly, Markelle, Boyd, Alex, Showalter, Sam, Steyvers, Mark, Smyth, Padhraic
Applications of machine learning often involve making predictions based on both model outputs and the opinions of human experts. In this context, we investigate the problem of querying experts for class label predictions, using as few human queries as possible, and leveraging the class probability estimates of pre-trained classifiers. We develop a general Bayesian framework for this problem, modeling expert correlation via a joint latent representation, enabling simulation-based inference about the utility of additional expert queries, as well as inference of posterior distributions over unobserved expert labels. We apply our approach to two real-world medical classification problems, as well as to CIFAR-10H and ImageNet-16H, demonstrating substantial reductions relative to baselines in the cost of querying human experts while maintaining high prediction accuracy.
Distributed Expectation Propagation for Multi-Object Tracking over Sensor Networks
Li, Qing, Gan, Runze, Hopgood, James R., Davies, Michael E., Godsill, Simon J.
--In this paper, we present a novel distributed expectation propagation algorithm for multiple sensors, multiple objects tracking in cluttered environments. The proposed framework enables each sensor to operate locally while collaboratively exchanging moment estimates with other sensors, thus eliminating the need to transmit all data to a central processing node. Specifically, we introduce a fast and parallelisable Rao-Blackwellised Gibbs sampling scheme to approximate the tilted distributions, which enhances the accuracy and efficiency of expectation propagation updates. Results demonstrate that the proposed algorithm improves both communication and inference efficiency for multi-object tracking tasks with dynamic sensor connectivity and varying clutter levels. Multi-sensor multi-object tracking (MS-MOT) is a fundamental task in applications such as autonomous systems and surveillance.
Amortized variational transdimensional inference
Davies, Laurence, Mackinlay, Dan, Oliveira, Rafael, Sisson, Scott A.
The expressiveness of flow-based models combined with stochastic variational inference (SVI) has, in recent years, expanded the application of optimization-based Bayesian inference to include problems with complex data relationships. However, until now, SVI using flow-based models has been limited to problems of fixed dimension. We introduce CoSMIC, normalizing flows (COntextually-Specified Masking for Identity-mapped Components), an extension to neural autoregressive conditional normalizing flow architectures that enables using a single amortized variational density for inference over a transdimensional target distribution. We propose a combined stochastic variational transdimensional inference (VTI) approach to training CoSMIC flows using techniques from Bayesian optimization and Monte Carlo gradient estimation. Numerical experiments demonstrate the performance of VTI on challenging problems that scale to high-cardinality model spaces.
Savage-Dickey density ratio estimation with normalizing flows for Bayesian model comparison
Lin, Kiyam, Polanska, Alicja, Piras, Davide, Mancini, Alessio Spurio, McEwen, Jason D.
A core motivation of science is to evaluate which scientific model best explains observed data. Bayesian model comparison provides a principled statistical approach to comparing scientific models and has found widespread application within cosmology and astrophysics. Calculating the Bayesian evidence is computationally challenging, especially as we continue to explore increasingly more complex models. The Savage-Dickey density ratio (SDDR) provides a method to calculate the Bayes factor (evidence ratio) between two nested models using only posterior samples from the super model. The SDDR requires the calculation of a normalised marginal distribution over the extra parameters of the super model, which has typically been performed using classical density estimators, such as histograms. Classical density estimators, however, can struggle to scale to high-dimensional settings. We introduce a neural SDDR approach using normalizing flows that can scale to settings where the super model contains a large number of extra parameters. We demonstrate the effectiveness of this neural SDDR methodology applied to both toy and realistic cosmological examples. For a field-level inference setting, we show that Bayes factors computed for a Bayesian hierarchical model (BHM) and simulation-based inference (SBI) approach are consistent, providing further validation that SBI extracts as much cosmological information from the field as the BHM approach. The SDDR estimator with normalizing flows is implemented in the open-source harmonic Python package.
Distributional encoding for Gaussian process regression with qualitative inputs
Gaussian Process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing method for the optimization of black-box functions. However, when all or some input variables are categorical, building a predictive and computationally efficient GP remains challenging. Starting from the naive target encoding idea, where the original categorical values are replaced with the mean of the target variable for that category, we propose a generalization based on distributional encoding (DE) which makes use of all samples of the target variable for a category. To handle this type of encoding inside the GP, we build upon recent results on characteristic kernels for probability distributions, based on the maximum mean discrepancy and the Wasserstein distance. We also discuss several extensions for classification, multi-task learning and incorporation or auxiliary information. Our approach is validated empirically, and we demonstrate state-of-the-art predictive performance on a variety of synthetic and real-world datasets. DE is naturally complementary to recent advances in BO over discrete and mixed-spaces.
Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach
Chen, Haoxuan, Ren, Yinuo, Min, Martin Renqiang, Ying, Lexing, Izzo, Zachary
Diffusion models (DMs) have proven to be effective in modeling high-dimensional distributions, leading to their widespread adoption for representing complex priors in Bayesian inverse problems (BIPs). However, current DM-based posterior sampling methods proposed for solving common BIPs rely on heuristic approximations to the generative process. To exploit the generative capability of DMs and avoid the usage of such approximations, we propose an ensemble-based algorithm that performs posterior sampling without the use of heuristic approximations. Our algorithm is motivated by existing works that combine DM-based methods with the sequential Monte Carlo (SMC) method. By examining how the prior evolves through the diffusion process encoded by the pre-trained score function, we derive a modified partial differential equation (PDE) governing the evolution of the corresponding posterior distribution. This PDE includes a modified diffusion term and a reweighting term, which can be simulated via stochastic weighted particle methods. Theoretically, we prove that the error between the true posterior distribution can be bounded in terms of the training error of the pre-trained score function and the number of particles in the ensemble. Empirically, we validate our algorithm on several inverse problems in imaging to show that our method gives more accurate reconstructions compared to existing DM-based methods.
Progressive Tempering Sampler with Diffusion
Rissanen, Severi, OuYang, RuiKang, He, Jiajun, Chen, Wenlin, Heinonen, Markus, Solin, Arno, Hernández-Lobato, José Miguel
Recent research has focused on designing neural samplers that amortize the process of sampling from unnormalized densities. However, despite significant advancements, they still fall short of the state-of-the-art MCMC approach, Parallel Tempering (PT), when it comes to the efficiency of target evaluations. On the other hand, unlike a well-trained neural sampler, PT yields only dependent samples and needs to be rerun -- at considerable computational cost -- whenever new samples are required. To address these weaknesses, we propose the Progressive Tempering Sampler with Diffusion (PTSD), which trains diffusion models sequentially across temperatures, leveraging the advantages of PT to improve the training of neural samplers. We also introduce a novel method to combine high-temperature diffusion models to generate approximate lower-temperature samples, which are minimally refined using MCMC and used to train the next diffusion model. PTSD enables efficient reuse of sample information across temperature levels while generating well-mixed, uncorrelated samples. Our method significantly improves target evaluation efficiency, outperforming diffusion-based neural samplers.