Bayesian Learning
Bayesian Inference for Correlated Human Experts and Classifiers
Kelly, Markelle, Boyd, Alex, Showalter, Sam, Steyvers, Mark, Smyth, Padhraic
Applications of machine learning often involve making predictions based on both model outputs and the opinions of human experts. In this context, we investigate the problem of querying experts for class label predictions, using as few human queries as possible, and leveraging the class probability estimates of pre-trained classifiers. We develop a general Bayesian framework for this problem, modeling expert correlation via a joint latent representation, enabling simulation-based inference about the utility of additional expert queries, as well as inference of posterior distributions over unobserved expert labels. We apply our approach to two real-world medical classification problems, as well as to CIFAR-10H and ImageNet-16H, demonstrating substantial reductions relative to baselines in the cost of querying human experts while maintaining high prediction accuracy.
Amortized variational transdimensional inference
Davies, Laurence, Mackinlay, Dan, Oliveira, Rafael, Sisson, Scott A.
The expressiveness of flow-based models combined with stochastic variational inference (SVI) has, in recent years, expanded the application of optimization-based Bayesian inference to include problems with complex data relationships. However, until now, SVI using flow-based models has been limited to problems of fixed dimension. We introduce CoSMIC, normalizing flows (COntextually-Specified Masking for Identity-mapped Components), an extension to neural autoregressive conditional normalizing flow architectures that enables using a single amortized variational density for inference over a transdimensional target distribution. We propose a combined stochastic variational transdimensional inference (VTI) approach to training CoSMIC flows using techniques from Bayesian optimization and Monte Carlo gradient estimation. Numerical experiments demonstrate the performance of VTI on challenging problems that scale to high-cardinality model spaces.
Savage-Dickey density ratio estimation with normalizing flows for Bayesian model comparison
Lin, Kiyam, Polanska, Alicja, Piras, Davide, Mancini, Alessio Spurio, McEwen, Jason D.
A core motivation of science is to evaluate which scientific model best explains observed data. Bayesian model comparison provides a principled statistical approach to comparing scientific models and has found widespread application within cosmology and astrophysics. Calculating the Bayesian evidence is computationally challenging, especially as we continue to explore increasingly more complex models. The Savage-Dickey density ratio (SDDR) provides a method to calculate the Bayes factor (evidence ratio) between two nested models using only posterior samples from the super model. The SDDR requires the calculation of a normalised marginal distribution over the extra parameters of the super model, which has typically been performed using classical density estimators, such as histograms. Classical density estimators, however, can struggle to scale to high-dimensional settings. We introduce a neural SDDR approach using normalizing flows that can scale to settings where the super model contains a large number of extra parameters. We demonstrate the effectiveness of this neural SDDR methodology applied to both toy and realistic cosmological examples. For a field-level inference setting, we show that Bayes factors computed for a Bayesian hierarchical model (BHM) and simulation-based inference (SBI) approach are consistent, providing further validation that SBI extracts as much cosmological information from the field as the BHM approach. The SDDR estimator with normalizing flows is implemented in the open-source harmonic Python package.
Distributional encoding for Gaussian process regression with qualitative inputs
Gaussian Process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing method for the optimization of black-box functions. However, when all or some input variables are categorical, building a predictive and computationally efficient GP remains challenging. Starting from the naive target encoding idea, where the original categorical values are replaced with the mean of the target variable for that category, we propose a generalization based on distributional encoding (DE) which makes use of all samples of the target variable for a category. To handle this type of encoding inside the GP, we build upon recent results on characteristic kernels for probability distributions, based on the maximum mean discrepancy and the Wasserstein distance. We also discuss several extensions for classification, multi-task learning and incorporation or auxiliary information. Our approach is validated empirically, and we demonstrate state-of-the-art predictive performance on a variety of synthetic and real-world datasets. DE is naturally complementary to recent advances in BO over discrete and mixed-spaces.
Solving Inverse Problems via Diffusion-Based Priors: An Approximation-Free Ensemble Sampling Approach
Chen, Haoxuan, Ren, Yinuo, Min, Martin Renqiang, Ying, Lexing, Izzo, Zachary
Diffusion models (DMs) have proven to be effective in modeling high-dimensional distributions, leading to their widespread adoption for representing complex priors in Bayesian inverse problems (BIPs). However, current DM-based posterior sampling methods proposed for solving common BIPs rely on heuristic approximations to the generative process. To exploit the generative capability of DMs and avoid the usage of such approximations, we propose an ensemble-based algorithm that performs posterior sampling without the use of heuristic approximations. Our algorithm is motivated by existing works that combine DM-based methods with the sequential Monte Carlo (SMC) method. By examining how the prior evolves through the diffusion process encoded by the pre-trained score function, we derive a modified partial differential equation (PDE) governing the evolution of the corresponding posterior distribution. This PDE includes a modified diffusion term and a reweighting term, which can be simulated via stochastic weighted particle methods. Theoretically, we prove that the error between the true posterior distribution can be bounded in terms of the training error of the pre-trained score function and the number of particles in the ensemble. Empirically, we validate our algorithm on several inverse problems in imaging to show that our method gives more accurate reconstructions compared to existing DM-based methods.
Enhanced Drought Analysis in Bangladesh: A Machine Learning Approach for Severity Classification Using Satellite Data
Paul, Tonmoy, Mati, Mrittika Devi, Islam, Md. Mahmudul
Drought poses a pervasive environmental challenge in Bangladesh, impacting agriculture, socio-economic stability, and food security due to its unique geographic and anthropogenic vulnerabilities. Traditional drought indices, such as the Standardized Precipitation Index (SPI) and Palmer Drought Severity Index (PDSI), often overlook crucial factors like soil moisture and temperature, limiting their resolution. Moreover, current machine learning models applied to drought prediction have been underexplored in the context of Bangladesh, lacking a comprehensive integration of satellite data across multiple districts. To address these gaps, we propose a satellite data-driven machine learning framework to classify drought across 38 districts of Bangladesh. Using unsupervised algorithms like K-means and Bayesian Gaussian Mixture for clustering, followed by classification models such as KNN, Random Forest, Decision Tree, and Naive Bayes, the framework integrates weather data (humidity, soil moisture, temperature) from 2012-2024. This approach successfully classifies drought severity into different levels. However, it shows significant variabilities in drought vulnerabilities across regions which highlights the aptitude of machine learning models in terms of identifying and predicting drought conditions.
Improved Uncertainty Quantification in Physics-Informed Neural Networks Using Error Bounds and Solution Bundles
Flores, Pablo, Graf, Olga, Protopapas, Pavlos, Pichara, Karim
Physics-Informed Neural Networks (PINNs) have been widely used to obtain solutions to various physical phenomena modeled as Differential Equations. As PINNs are not naturally equipped with mechanisms for Uncertainty Quantification, some work has been done to quantify the different uncertainties that arise when dealing with PINNs. In this paper, we use a two-step procedure to train Bayesian Neural Networks that provide uncertainties over the solutions to differential equation systems provided by PINNs. We use available error bounds over PINNs to formulate a heteroscedastic variance that improves the uncertainty estimation. Furthermore, we solve forward problems and utilize the obtained uncertainties when doing parameter estimation in inverse problems in cosmology.
Bayes Error Rate Estimation in Difficult Situations
Wheat, Lesley, Mohrenschildt, Martin v., Habibi, Saeid
The Bayes Error Rate (BER) is the fundamental limit on the achievable generalizable classification accuracy of any machine learning model due to inherent uncertainty within the data. BER estimators offer insight into the difficulty of any classification problem and set expectations for optimal classification performance. In order to be useful, the estimators must also be accurate with a limited number of samples on multivariate problems with unknown class distributions. To determine which estimators meet the minimum requirements for "usefulness", an in-depth examination of their accuracy is conducted using Monte Carlo simulations with synthetic data in order to obtain their confidence bounds for binary classification. To examine the usability of the estimators on real-world applications, new test scenarios are introduced upon which 2500 Monte Carlo simulations per scenario are run over a wide range of BER values. In a comparison of k-Nearest Neighbor (kNN), Generalized Henze-Penrose (GHP) divergence and Kernel Density Estimation (KDE) techniques, results show that kNN is overwhelmingly the more accurate non-parametric estimator. In order to reach the target of an under 5 percent range for the 95 percent confidence bounds, the minimum number of required samples per class is 1000. As more features are added, more samples are needed, so that 2500 samples per class are required at only 4 features. Other estimators do become more accurate than kNN as more features are added, but continuously fail to meet the target range.
A kernel conditional two-sample test
Massiani, Pierre-Franรงois, Fiedler, Christian, Haverbeck, Lukas, Solowjow, Friedrich, Trimpe, Sebastian
We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct conditional two-sample statistical tests. These tests identify the inputs -- called covariates in this context -- where two conditional expectations differ with high probability. Our key idea is to transform confidence bounds of a learning method into a conditional two-sample test, and we instantiate this principle for kernel ridge regression (KRR) and conditional kernel mean embeddings. We generalize existing pointwise-in-time or time-uniform confidence bounds for KRR to previously-inaccessible yet essential cases such as infinite-dimensional outputs with non-trace-class kernels. These bounds enable circumventing the need for independent data in our statistical tests, since they allow online sampling. We also introduce bootstrapping schemes leveraging the parametric form of testing thresholds identified in theory to avoid tuning inaccessible parameters, making our method readily applicable in practice. Such conditional two-sample tests are especially relevant in applications where data arrive sequentially or non-independently, or when output distributions vary with operational parameters. We demonstrate their utility through examples in process monitoring and comparison of dynamical systems. Overall, our results establish a comprehensive foundation for conditional two-sample testing, from theoretical guarantees to practical implementation, and advance the state-of-the-art on the concentration of vector-valued least squares estimation.
Position: There Is No Free Bayesian Uncertainty Quantification
Melev, Ivan, Kauermann, Goeran
Due to their intuitive appeal, Bayesian methods of modeling and uncertainty quantification have become popular in modern machine and deep learning. When providing a prior distribution over the parameter space, it is straightforward to obtain a distribution over the parameters that is conventionally interpreted as uncertainty quantification of the model. We challenge the validity of such Bayesian uncertainty quantification by discussing the equivalent optimization-based representation of Bayesian updating, provide an alternative interpretation that is coherent with the optimization-based perspective, propose measures of the quality of the Bayesian inferential stage, and suggest directions for future work.