Bayesian Inference
Graph-Structured Topic Modeling for Documents with Spatial or Covariate Dependencies
We address the challenge of incorporating document-level metadata into topic modeling to improve topic mixture estimation. To overcome the computational complexity and lack of theoretical guarantees in existing Bayesian methods, we extend probabilistic latent semantic indexing (pLSI), a frequentist framework for topic modeling, by incorporating document-level covariates or known similarities between documents through a graph formalism. Modeling documents as nodes and edges denoting similarities, we propose a new estimator based on a fast graph-regularized iterative singular value decomposition (SVD) that encourages similar documents to share similar topic mixture proportions. We characterize the estimation error of our proposed method by deriving high-probability bounds and develop a specialized cross-validation method to optimize our regularization parameters. We validate our model through comprehensive experiments on synthetic datasets and three real-world corpora, demonstrating improved performance and faster inference compared to existing Bayesian methods.
RDPI: A Refine Diffusion Probability Generation Method for Spatiotemporal Data Imputation
Liu, Zijin, Zhao, Xiang, Song, You
Spatiotemporal data imputation plays a crucial role in various fields such as traffic flow monitoring, air quality assessment, and climate prediction. However, spatiotemporal data collected by sensors often suffer from temporal incompleteness, and the sparse and uneven distribution of sensors leads to missing data in the spatial dimension. Among existing methods, autoregressive approaches are prone to error accumulation, while simple conditional diffusion models fail to adequately capture the spatiotemporal relationships between observed and missing data. To address these issues, we propose a novel two-stage Refined Diffusion Probability Impuation (RDPI) framework based on an initial network and a conditional diffusion model. In the initial stage, deterministic imputation methods are used to generate preliminary estimates of the missing data. In the refinement stage, residuals are treated as the diffusion target, and observed values are innovatively incorporated into the forward process. This results in a conditional diffusion model better suited for spatiotemporal data imputation, bridging the gap between the preliminary estimates and the true values. Experiments on multiple datasets demonstrate that RDPI not only achieves state-of-the-art imputation accuracy but also significantly reduces sampling computational costs.
Ask for More Than Bayes Optimal: A Theory of Indecisions for Classification
Ndaoud, Mohamed, Radchenko, Peter, Rava, Bradley
In this work, we address the problem of controlling a classifier's accuracy at any user-specified level through selective classification, regardless of the problem's inherent difficulty. Traditional classification frameworks are designed to approximate the Bayes optimal error rate as closely as possible. However, with the growing deployment of artificial intelligence (AI) systems in automated, high-stakes decision-making, it has become critical to ensure reliable control over a classifier's accuracy and to guarantee accurate predictions for all individuals. When the underlying problem is truly difficult, as indicated by the distance between the true distributions for each decision class, achieving control over the error rate of an automated decisionmaking system may be impossible. This is particularly true when the number of potential classes is large or when the distributions of these classes are close enough, significantly increasing the difficulty of the problem. This phenomenon is illustrated in Figure 1, where the task is to classify various observations as High-Risk or Low-Risk, while maintaining an error rate below 5%. In this example, the High-Risk and Low-Risk classes are modeled as mixtures of two normal distributions with means of 2 and 1, respectively, and a shared variance of 1. The Bayes classifier is represented by the dotted line in the leftmost plot of Figure 1. In this scenario, the Bayes optimal error rate is 15.9%, significantly exceeding our target classification error of 5%.
Adaptive Nonparametric Perturbations of Parametric Bayesian Models
Wu, Bohan, Weinstein, Eli N., Salehi, Sohrab, Wang, Yixin, Blei, David M.
Parametric Bayesian modeling offers a powerful and flexible toolbox for scientific data analysis. Yet the model, however detailed, may still be wrong, and this can make inferences untrustworthy. In this paper we study nonparametrically perturbed parametric (NPP) Bayesian models, in which a parametric Bayesian model is relaxed via a distortion of its likelihood. We analyze the properties of NPP models when the target of inference is the true data distribution or some functional of it, such as in causal inference. We show that NPP models can offer the robustness of nonparametric models while retaining the data efficiency of parametric models, achieving fast convergence when the parametric model is close to true. To efficiently analyze data with an NPP model, we develop a generalized Bayes procedure to approximate its posterior. We demonstrate our method by estimating causal effects of gene expression from single cell RNA sequencing data. NPP modeling offers an efficient approach to robust Bayesian inference and can be used to robustify any parametric Bayesian model.
Exploring Diffusion and Flow Matching Under Generator Matching
Patel, Zeeshan, DeLoye, James, Mathias, Lance
Recent techniques in deep generative modeling have leveraged Markov generative processes to learn complex, high-dimensional probability distributions in a more structured and flexible manner [17]. By integrating Markov chain methods with deep neural architectures, these approaches aim to exploit the representational power of deep networks while maintaining a tractable and theoretically grounded training procedure. In contrast to early generative models that relied heavily on direct maximum likelihood estimation or adversarial objectives, this class of methods employs iterative stochastic transformations--often expressed as Markovian updates--to gradually refine initial noise samples into samples drawn from the desired target distribution. Diffusion and flow matching models represent two prominent classes of generative approaches that construct data samples through a sequence of continuous transformations. Diffusion models [6, 13] introduce a forward-noising and reverse-denoising process, progressively refining a simple noise distribution into a complex target distribution by learning to undo incremental noise corruption at each step.
Statistical learning does not always entail knowledge
Díaz-Pachón, Daniel Andrés, Gallegos, H. Renata, Hössjer, Ola, Rao, J. Sunil
In this paper, we study learning and knowledge acquisition (LKA) of an agent about a proposition that is either true or false. We use a Bayesian approach, where the agent receives data to update his beliefs about the proposition according to a posterior distribution. The LKA is formulated in terms of active information, with data representing external or exogenous information that modifies the agent's beliefs. It is assumed that data provide details about a number of features that are relevant to the proposition. We show that this leads to a Gibbs distribution posterior, which is in maximum entropy relative to the prior, conditioned on the side constraints that the data provide in terms of the features. We demonstrate that full learning is sometimes not possible and full knowledge acquisition is never possible when the number of extracted features is too small. We also distinguish between primary learning (receiving data about features of relevance for the proposition) and secondary learning (receiving data about the learning of another agent). We argue that this type of secondary learning does not represent true knowledge acquisition. Our results have implications for statistical learning algorithms, and we claim that such algorithms do not always generate true knowledge. The theory is illustrated with several examples.
Generalized Bayesian deep reinforcement learning
Roy, Shreya Sinha, Everitt, Richard G., Robert, Christian P., Dutta, Ritabrata
Bayesian reinforcement learning (BRL) is a method that merges principles from Bayesian statistics and reinforcement learning to make optimal decisions in uncertain environments. Similar to other model-based RL approaches, it involves two key components: (1) Inferring the posterior distribution of the data generating process (DGP) modeling the true environment and (2) policy learning using the learned posterior. We propose to model the dynamics of the unknown environment through deep generative models assuming Markov dependence. In absence of likelihood functions for these models we train them by learning a generalized predictive-sequential (or prequential) scoring rule (SR) posterior. We use sequential Monte Carlo (SMC) samplers to draw samples from this generalized Bayesian posterior distribution. In conjunction, to achieve scalability in the high dimensional parameter space of the neural networks, we use the gradient based Markov chain Monte Carlo (MCMC) kernels within SMC. To justify the use of the prequential scoring rule posterior we prove a Bernstein-von Misses type theorem. For policy learning, we propose expected Thompson sampling (ETS) to learn the optimal policy by maximizing the expected value function with respect to the posterior distribution. This improves upon traditional Thompson sampling (TS) and its extensions which utilize only one sample drawn from the posterior distribution. This improvement is studied both theoretically and using simulation studies assuming discrete action and state-space. Finally we successfully extend our setup for a challenging problem with continuous action space without theoretical guarantees.
BA-BFL: Barycentric Aggregation for Bayesian Federated Learning
Jamoussi, Nour, Serra, Giuseppe, Stavrou, Photios A., Kountouris, Marios
In this work, we study the problem of aggregation in the context of Bayesian Federated Learning (BFL). Using an information geometric perspective, we interpret the BFL aggregation step as finding the barycenter of the trained posteriors for a pre-specified divergence metric. We study the barycenter problem for the parametric family of $\alpha$-divergences and, focusing on the standard case of independent and Gaussian distributed parameters, we recover the closed-form solution of the reverse Kullback-Leibler barycenter and develop the analytical form of the squared Wasserstein-2 barycenter. Considering a non-IID setup, where clients possess heterogeneous data, we analyze the performance of the developed algorithms against state-of-the-art (SOTA) Bayesian aggregation methods in terms of accuracy, uncertainty quantification (UQ), model calibration (MC), and fairness. Finally, we extend our analysis to the framework of Hybrid Bayesian Deep Learning (HBDL), where we study how the number of Bayesian layers in the architecture impacts the considered performance metrics. Our experimental results show that the proposed methodology presents comparable performance with the SOTA while offering a geometric interpretation of the aggregation phase.
Extrapolating Jet Radiation with Autoregressive Transformers
Butter, Anja, Charton, François, Villadamigo, Javier Mariño, Ore, Ayodele, Plehn, Tilman, Spinner, Jonas
Generative networks are an exciting tool for fast LHC event generation. Usually, they are used to generate configurations with a fixed number of particles. Autoregressive transformers allow us to generate events with variable numbers of particles, very much in line with the physics of QCD jet radiation. We show how they can learn a factorized likelihood for jet radiation and extrapolate in terms of the number of generated jets. For this extrapolation, bootstrapping training data and training with modifications of the likelihood loss can be used.
Improving Cooperation in Language Games with Bayesian Inference and the Cognitive Hierarchy
Bills, Joseph, Archibald, Christopher, Blaylock, Diego
In two-player cooperative games, agents can play together effectively when they have accurate assumptions about how their teammate will behave, but may perform poorly when these assumptions are inaccurate. In language games, failure may be due to disagreement in the understanding of either the semantics or pragmatics of an utterance. We model coarse uncertainty in semantics using a prior distribution of language models and uncertainty in pragmatics using the cognitive hierarchy, combining the two aspects into a single prior distribution over possible partner types. Fine-grained uncertainty in semantics is modeled using noise that is added to the embeddings of words in the language. To handle all forms of uncertainty we construct agents that learn the behavior of their partner using Bayesian inference and use this information to maximize the expected value of a heuristic function. We test this approach by constructing Bayesian agents for the game of Codenames, and show that they perform better in experiments where semantics is uncertain