Goto

Collaborating Authors

 Bayesian Inference


Strong posterior contraction rates via Wasserstein dynamics

arXiv.org Machine Learning

In Bayesian statistics, posterior contraction rates (PCRs) quantify the speed at which the posterior distribution concentrates on arbitrarily small neighborhoods of a true model, in a suitable way, as the sample size goes to infinity. In this paper, we develop a new approach to PCRs, with respect to strong norm distances on parameter spaces of functions. Critical to our approach is the combination of a local Lipschitz-continuity for the posterior distribution with a dynamic formulation of the Wasserstein distance, which allows to set forth an interesting connection between PCRs and some classical problems arising in mathematical analysis, probability and statistics, e.g., Laplace methods for approximating integrals, Sanov's large deviation principles in the Wasserstein distance, rates of convergence of mean Glivenko-Cantelli theorems, and estimates of weighted Poincar\'e-Wirtinger constants. We first present a theorem on PCRs for a model in the regular infinite-dimensional exponential family, which exploits sufficient statistics of the model, and then extend such a theorem to a general dominated model. These results rely on the development of novel techniques to evaluate Laplace integrals and weighted Poincar\'e-Wirtinger constants in infinite-dimension, which are of independent interest. The proposed approach is applied to the regular parametric model, the multinomial model, the finite-dimensional and the infinite-dimensional logistic-Gaussian model and the infinite-dimensional linear regression. In general, our approach leads to optimal PCRs in finite-dimensional models, whereas for infinite-dimensional models it is shown explicitly how the prior distribution affect PCRs.


Stochastic PDE representation of random fields for large-scale Gaussian process regression and statistical finite element analysis

arXiv.org Machine Learning

The efficient representation of random fields on geometrically complex domains is crucial for Bayesian modelling in engineering and machine learning. Today's prevalent random field representations are either intended for unbounded domains or are too restrictive in terms of possible field properties. Because of these limitations, techniques leveraging the historically established link between stochastic PDEs (SPDEs) and random fields have been gaining interest. The SPDE representation is especially appealing for engineering applications which already have a finite element discretisation for solving the physical conservation equations. In contrast to the dense covariance matrix of a random field, its inverse, the precision matrix, is usually sparse and equal to the stiffness matrix of an elliptic SPDE. We use the SPDE representation to develop a scalable framework for large-scale statistical finite element analysis and Gaussian process (GP) regression on complex geometries. The statistical finite element method (statFEM) introduced by Girolami et al. (2022) is a novel approach for synthesising measurement data and finite element models. In both statFEM and GP regression, we use the SPDE formulation to obtain the relevant prior probability densities with a sparse precision matrix. The properties of the priors are governed by the parameters and possibly fractional order of the SPDE so that we can model on bounded domains and manifolds anisotropic, non-stationary random fields with arbitrary smoothness. The observation models for statFEM and GP regression are such that the posterior probability densities are Gaussians with a closed-form mean and precision. The respective mean vector and precision matrix and can be evaluated using only sparse matrix operations. We demonstrate the versatility of the proposed framework and its convergence properties with Poisson and thin-shell examples.


Aggregating Correlated Estimations with (Almost) no Training

arXiv.org Artificial Intelligence

Many decision problems cannot be solved exactly and use several estimation algorithms that assign scores to the different available options. The estimation errors can have various correlations, from low (e.g. between two very different approaches) to high (e.g. when using a given algorithm with different hyperparameters). Most aggregation rules would suffer from this diversity of correlations. In this article, we propose different aggregation rules that take correlations into account, and we compare them to naive rules in various experiments based on synthetic data. Our results show that when sufficient information is known about the correlations between errors, a maximum likelihood aggregation should be preferred. Otherwise, typically with limited training data, we recommend a method that we call Embedded Voting (EV).


Fair Ranking under Disparate Uncertainty

arXiv.org Artificial Intelligence

Ranking is a ubiquitous method for focusing the attention of human evaluators on a manageable subset of options. Its use ranges from surfacing potentially relevant products on an e-commerce site to prioritizing college applications for human review. While ranking can make human evaluation far more effective by focusing attention on the most promising options, we argue that it can introduce unfairness if the uncertainty of the underlying relevance model differs between groups of options. Unfortunately, such disparity in uncertainty appears widespread, since the relevance estimates for minority groups tend to have higher uncertainty due to a lack of data or appropriate features. To overcome this fairness issue, we propose Equal-Opportunity Ranking (EOR) as a new fairness criterion for ranking that provably corrects for the disparity in uncertainty between groups. Furthermore, we present a practical algorithm for computing EOR rankings in time $O(n \log(n))$ and prove its close approximation guarantee to the globally optimal solution. In a comprehensive empirical evaluation on synthetic data, a US Census dataset, and a real-world case study of Amazon search queries, we find that the algorithm reliably guarantees EOR fairness while providing effective rankings.


Federated cINN Clustering for Accurate Clustered Federated Learning

arXiv.org Artificial Intelligence

Federated Learning (FL) presents an innovative approach to privacy-preserving distributed machine learning and enables efficient crowd intelligence on a large scale. However, a significant challenge arises when coordinating FL with crowd intelligence which diverse client groups possess disparate objectives due to data heterogeneity or distinct tasks. To address this challenge, we propose the Federated cINN Clustering Algorithm (FCCA) to robustly cluster clients into different groups, avoiding mutual interference between clients with data heterogeneity, and thereby enhancing the performance of the global model. Specifically, FCCA utilizes a global encoder to transform each client's private data into multivariate Gaussian distributions. It then employs a generative model to learn encoded latent features through maximum likelihood estimation, which eases optimization and avoids mode collapse. Finally, the central server collects converged local models to approximate similarities between clients and thus partition them into distinct clusters. Extensive experimental results demonstrate FCCA's superiority over other state-of-the-art clustered federated learning algorithms, evaluated on various models and datasets. These results suggest that our approach has substantial potential to enhance the efficiency and accuracy of real-world federated learning tasks.


Efficient computation of predictive probabilities in probit models via expectation propagation

arXiv.org Machine Learning

Binary regression models represent a popular model-based approach for binary classification. In the Bayesian framework, computational challenges in the form of the posterior distribution motivate still-ongoing fruitful research. Here, we focus on the computation of predictive probabilities in Bayesian probit models via expectation propagation (EP). Leveraging more general results in recent literature, we show that such predictive probabilities admit a closed-form expression. Improvements over state-of-the-art approaches are shown in a simulation study.


Accelerating Markov Chain Monte Carlo sampling with diffusion models

arXiv.org Machine Learning

Global fits of physics models require efficient methods for exploring high-dimensional and/or multimodal posterior functions. We introduce a novel method for accelerating Markov Chain Monte Carlo (MCMC) sampling by pairing a Metropolis-Hastings algorithm with a diffusion model that can draw global samples with the aim of approximating the posterior. We briefly review diffusion models in the context of image synthesis before providing a streamlined diffusion model tailored towards low-dimensional data arrays. We then present our adapted Metropolis-Hastings algorithm which combines local proposals with global proposals taken from a diffusion model that is regularly trained on the samples produced during the MCMC run. Our approach leads to a significant reduction in the number of likelihood evaluations required to obtain an accurate representation of the Bayesian posterior across several analytic functions, as well as for a physical example based on a global analysis of parton distribution functions. Our method is extensible to other MCMC techniques, and we briefly compare our method to similar approaches based on normalizing flows. A code implementation can be found at https://github.com/NickHunt-Smith/MCMC-diffusion.


Bayesian inference of composition-dependent phase diagrams

arXiv.org Artificial Intelligence

Phase diagrams serve as a highly informative tool for materials design, encapsulating information about the phases that a material can manifest under specific conditions. In this work, we develop a method in which Bayesian inference is employed to combine thermodynamic data from molecular dynamics (MD), melting point simulations, and phonon calculations, process these data, and yield a temperature-concentration phase diagram. The employed Bayesian framework yields us not only the free energies of different phases as functions of temperature and concentration but also the uncertainties of these free energies originating from statistical errors inherent to finite-length MD trajectories. Furthermore, it extrapolates the results of the finite-atom calculations to the infinite-atom limit and facilitates the choice of temperature, chemical potentials, and the number of atoms conducting the next simulation with which will be the most efficient in reducing the uncertainty of the phase diagram. The developed algorithm was successfully tested on two binary systems, Ge-Si and K-Na, in the full range of concentrations and temperatures.


Cognition-Mode Aware Variational Representation Learning Framework for Knowledge Tracing

arXiv.org Artificial Intelligence

The Knowledge Tracing (KT) task plays a crucial role in personalized learning, and its purpose is to predict student responses based on their historical practice behavior sequence. However, the KT task suffers from data sparsity, which makes it challenging to learn robust representations for students with few practice records and increases the risk of model overfitting. Therefore, in this paper, we propose a Cognition-Mode Aware Variational Representation Learning Framework (CMVF) that can be directly applied to existing KT methods. Our framework uses a probabilistic model to generate a distribution for each student, accounting for uncertainty in those with limited practice records, and estimate the student's distribution via variational inference (VI). In addition, we also introduce a cognition-mode aware multinomial distribution as prior knowledge that constrains the posterior student distributions learning, so as to ensure that students with similar cognition modes have similar distributions, avoiding overwhelming personalization for students with few practice records. At last, extensive experimental results confirm that CMVF can effectively aid existing KT methods in learning more robust student representations. Our code is available at https://github.com/zmy-9/CMVF.


Bayesian sparsity and class sparsity priors for dictionary learning and coding

arXiv.org Machine Learning

Dictionary learning methods continue to gain popularity for the solution of challenging inverse problems. In the dictionary learning approach, the computational forward model is replaced by a large dictionary of possible outcomes, and the problem is to identify the dictionary entries that best match the data, akin to traditional query matching in search engines. Sparse coding techniques are used to guarantee that the dictionary matching identifies only few of the dictionary entries, and dictionary compression methods are used to reduce the complexity of the matching problem. In this article, we propose a work flow to facilitate the dictionary matching process. First, the full dictionary is divided into subdictionaries that are separately compressed. The error introduced by the dictionary compression is handled in the Bayesian framework as a modeling error. Furthermore, we propose a new Bayesian data-driven group sparsity coding method to help identify subdictionaries that are not relevant for the dictionary matching. After discarding irrelevant subdictionaries, the dictionary matching is addressed as a deflated problem using sparse coding. The compression and deflation steps can lead to substantial decreases of the computational complexity. The effectiveness of compensating for the dictionary compression error and using the novel group sparsity promotion to deflate the original dictionary are illustrated by applying the methodology to real world problems, the glitch detection in the LIGO experiment and hyperspectral remote sensing.