Bayesian Inference
Multiparameter Uncertainty Mapping in Quantitative Molecular MRI using a Physics-Structured Variational Autoencoder (PS-VAE)
Finkelstein, Alex, Moneta, Ron, Zohar, Or, Rivlin, Michal, Zaiss, Moritz, Morvinski, Dinora Friedmann, Perlman, Or
Quantitative imaging methods, such as magnetic resonance fingerprinting (MRF), aim to extract interpretable pathology biomarkers by estimating biophysical tissue parameters from signal evolutions. However, the pattern-matching algorithms or neural networks used in such inverse problems often lack principled uncertainty quantification, which limits the trustworthiness and transparency, required for clinical acceptance. Here, we describe a physics-structured variational autoencoder (PS-VAE) designed for rapid extraction of voxelwise multi-parameter posterior distributions. Our approach integrates a differentiable spin physics simulator with self-supervised learning, and provides a full covariance that captures the inter-parameter correlations of the latent biophysical space. The method was validated in a multi-proton pool chemical exchange saturation transfer (CEST) and semisolid magnetization transfer (MT) molecular MRF study, across in-vitro phantoms, tumor-bearing mice, healthy human volunteers, and a subject with glioblastoma. The resulting multi-parametric posteriors are in good agreement with those calculated using a brute-force Bayesian analysis, while providing an orders-of-magnitude acceleration in whole brain quantification. In addition, we demonstrate how monitoring the multi-parameter posterior dynamics across progressively acquired signals provides practical insights for protocol optimization and may facilitate real-time adaptive acquisition.
Score-based diffusion models for diffuse optical tomography with uncertainty quantification
Schneider, Fabian, Mozumder, Meghdoot, Tamarov, Konstantin, Taghizadeh, Leila, Tarvainen, Tanja, Helin, Tapio, Duong, Duc-Lam
Score-based diffusion models are a recently developed framework for posterior sampling in Bayesian inverse problems with a state-of-the-art performance for severely ill-posed problems by leveraging a powerful prior distribution learned from empirical data. Despite generating significant interest especially in the machine-learning community, a thorough study of realistic inverse problems in the presence of modelling error and utilization of physical measurement data is still outstanding. In this work, the framework of unconditional representation for the conditional score function (UCoS) is evaluated for linearized difference imaging in diffuse optical tomography (DOT). DOT uses boundary measurements of near-infrared light to estimate the spatial distribution of absorption and scattering parameters in biological tissues. The problem is highly ill-posed and thus sensitive to noise and modelling errors. We introduce a novel regularization approach that prevents overfitting of the score function by constructing a mixed score composed of a learned and a model-based component. Validation of this approach is done using both simulated and experimental measurement data. The experiments demonstrate that a data-driven prior distribution results in posterior samples with low variance, compared to classical model-based estimation, and centred around the ground truth, even in the context of a highly ill-posed problem and in the presence of modelling errors.
Sharp Inequalities between Total Variation and Hellinger Distances for Gaussian Mixtures
Sharp Inequalities between Total Variation and Hellinger Distances for Gaussian Mixtures Joonhyuk Jung 1 and Chao Gao 1 1 Department of Statistics, University of Chicago Abstract We study the relation between the total variation (TV) and Hellinger distances between two Gaussian location mixtures. Our first result establishes a general upper bound: for any two mixing distributions supported on a compact set, the Hellinger distance between the two mixtures is controlled by the TV distance raised to a power 1 o(1), where the o(1) term is of order 1/log log(1/TV). We also construct two sequences of mixing distributions that demonstrate the sharpness of this bound. Taken together, our results resolve an open problem raised in Jia et al. (2023) and thus lead to an entropic characterization of learning Gaussian mixtures in total variation. Our inequality also yields optimal robust estimation of Gaussian mixtures in Hellinger distance, which has a direct implication for bounding the minimax regret of empirical Bayes under Huber contamination. 1 Introduction The Gaussian location mixture is one of the most fundamental models used in nonparametric density estimation, Bayesian inference, and clustering (Lindsay, 1995; Dasgupta, 1999). Given a probability measure ฯ supported on R d, the induced Gaussian mixture is defined by f ฯ(x) = null R dฯ d(x ฮธ)dฯ(ฮธ), where ฯ d(x) = (2ฯ) d/2 exp( x 2 2/2) is the density function of the d-dimensional standard Gaussian distribution. In this paper, we study the relation between the total variation distance TV(p,q):= 1 2 null |p q| and the Hellinger distance H(p,q):= null 1 2 null ( p q) 2 of two Gaussian mixture densities.
Importance Weighted Variational Inference without the Reparameterization Trick
Daudel, Kamรฉlia, Tran, Minh-Ngoc, Zhang, Cheng
Importance weighted variational inference (VI) approximates densities known up to a normalizing constant by optimizing bounds that tighten with the number of Monte Carlo samples $N$. Standard optimization relies on reparameterized gradient estimators, which are well-studied theoretically yet restrict both the choice of the data-generating process and the variational approximation. While REINFORCE gradient estimators do not suffer from such restrictions, they lack rigorous theoretical justification. In this paper, we provide the first comprehensive analysis of REINFORCE gradient estimators in importance weighted VI, leveraging this theoretical foundation to diagnose and resolve fundamental deficiencies in current state-of-the-art estimators. Specifically, we introduce and examine a generalized family of variational inference for Monte Carlo objectives (VIMCO) gradient estimators. We prove that state-of-the-art VIMCO gradient estimators exhibit a vanishing signal-to-noise ratio (SNR) as $N$ increases, which prevents effective optimization. To overcome this issue, we propose the novel VIMCO-$\star$ gradient estimator and show that it averts the SNR collapse of existing VIMCO gradient estimators by achieving a $\sqrt{N}$ SNR scaling instead. We demonstrate its superior empirical performance compared to current VIMCO implementations in challenging settings where reparameterized gradients are typically unavailable.
When Is Generalized Bayes Bayesian? A Decision-Theoretic Characterization of Loss-Based Updating
McAlinn, Kenichiro, Takanashi, Kลsaku
Loss-based updating, including generalized Bayes, Gibbs, and quasi-posteriors, replaces likelihoods by a user-chosen loss and produces a posterior-like distribution via exponential tilt. We give a decision-theoretic characterization that separates \emph{belief posteriors} -- conditional beliefs justified by the foundations of Savage and Anscombe-Aumann under a joint probability mode l-- from \emph{decision posteriors} -- randomized decision rules justified by preferences over decision rules. We make explicit that a loss-based posterior coincides with ordinary Bayes if and only if the loss is, up to scale and a data-only term, negative log-likelihood. We then show that generalized marginal likelihood is not evidence for decision posteriors, and Bayes factors are not well-defined without additional structure. In the decision posterior regime, non-degenerate posteriors require nonlinear preferences over decision rules. Under sequential coherence and separability, these lead to an entropy-penalized variational representation yielding generalized Bayes as the optimal rule.
Deep Multivariate Models with Parametric Conditionals
Schlesinger, Dmitrij, Flach, Boris, Shekhovtsov, Alexander
We consider deep multivariate models for heterogeneous collections of random variables. In the context of computer vision, such collections may e.g. consist of images, segmentations, image attributes, and latent variables. When developing such models, most existing works start from an application task and design the model components and their dependencies to meet the needs of the chosen task. This has the disadvantage of limiting the applicability of the resulting model for other downstream tasks. Here, instead, we propose to represent the joint probability distribution by means of conditional probability distributions for each group of variables conditioned on the rest. Such models can then be used for practically any possible downstream task. Their learning can be approached as training a parametrised Markov chain kernel by maximising the data likelihood of its limiting distribution. This has the additional advantage of allowing a wide range of semi-supervised learning scenarios.
Density-Informed Pseudo-Counts for Calibrated Evidential Deep Learning
Carlotti, Pietro, Gligiฤ, Nevena, Farahi, Arya
Evidential Deep Learning (EDL) is a popular framework for uncertainty-aware classification that models predictive uncertainty via Dirichlet distributions parameterized by neural networks. Despite its popularity, its theoretical foundations and behavior under distributional shift remain poorly understood. In this work, we provide a principled statistical interpretation by proving that EDL training corresponds to amortized variational inference in a hierarchical Bayesian model with a tempered pseudo-likelihood. This perspective reveals a major drawback: standard EDL conflates epistemic and aleatoric uncertainty, leading to systematic overconfidence on out-of-distribution (OOD) inputs. To address this, we introduce Density-Informed Pseudo-count EDL (DIP-EDL), a new parametrization that decouples class prediction from the magnitude of uncertainty by separately estimating the conditional label distribution and the marginal covariate density. This separation preserves evidence in high-density regions while shrinking predictions toward a uniform prior for OOD data. Theoretically, we prove that DIP-EDL achieves asymptotic concentration. Empirically, we show that our method enhances interpretability and improves robustness and uncertainty calibration under distributional shift.
Simulation-based Bayesian inference with ameliorative learned summary statistics -- Part I
This paper, which is Part 1 of a two-part paper series, considers a simulation-based inference with learned summary statistics, in which such a learned summary statistic serves as an empirical-likelihood with ameliorative effects in the Bayesian setting, when the exact likelihood function associated with the observation data and the simulation model is difficult to obtain in a closed form or computationally intractable. In particular, a transformation technique which leverages the Cressie-Read discrepancy criterion under moment restrictions is used for summarizing the learned statistics between the observation data and the simulation outputs, while preserving the statistical power of the inference. Here, such a transformation of data-to-learned summary statistics also allows the simulation outputs to be conditioned on the observation data, so that the inference task can be performed over certain sample sets of the observation data that are considered as an empirical relevance or believed to be particular importance. Moreover, the simulation-based inference framework discussed in this paper can be extended further, and thus handling weakly dependent observation data. Finally, we remark that such an inference framework is suitable for implementation in distributed computing, i.e., computational tasks involving both the data-to-learned summary statistics and the Bayesian inferencing problem can be posed as a unified distributed inference problem that will exploit distributed optimization and MCMC algorithms for supporting large datasets associated with complex simulation models.
OneFlowSBI: One Model, Many Queries for Simulation-Based Inference
Nautiyal, Mayank, Ju, Li, Ernfors, Melker, Hagland, Klara, Holma, Ville, Sรถderholm, Maximilian Werkรถ, Hellander, Andreas, Singh, Prashant
We introduce \textit{OneFlowSBI}, a unified framework for simulation-based inference that learns a single flow-matching generative model over the joint distribution of parameters and observations. Leveraging a query-aware masking distribution during training, the same model supports multiple inference tasks, including posterior sampling, likelihood estimation, and arbitrary conditional distributions, without task-specific retraining. We evaluate \textit{OneFlowSBI} on ten benchmark inference problems and two high-dimensional real-world inverse problems across multiple simulation budgets. \textit{OneFlowSBI} is shown to deliver competitive performance against state-of-the-art generalized inference solvers and specialized posterior estimators, while enabling efficient sampling with few ODE integration steps and remaining robust under noisy and partially observed data.
Nested Slice Sampling: Vectorized Nested Sampling for GPU-Accelerated Inference
Yallup, David, Kroupa, Namu, Handley, Will
Model comparison and calibrated uncertainty quantification often require integrating over parameters, but scalable inference can be challenging for complex, multimodal targets. Nested Sampling is a robust alternative to standard MCMC, yet its typically sequential structure and hard constraints make efficient accelerator implementations difficult. This paper introduces Nested Slice Sampling (NSS), a GPU-friendly, vectorized formulation of Nested Sampling that uses Hit-and-Run Slice Sampling for constrained updates. A tuning analysis yields a simple near-optimal rule for setting the slice width, improving high-dimensional behavior and making per-step compute more predictable for parallel execution. Experiments on challenging synthetic targets, high dimensional Bayesian inference, and Gaussian process hyperparameter marginalization show that NSS maintains accurate evidence estimates and high-quality posterior samples, and is particularly robust on difficult multimodal problems where current state-of-the-art methods such as tempered SMC baselines can struggle. An open-source implementation is released to facilitate adoption and reproducibility.