Goto

Collaborating Authors

 Bayesian Inference


Progressive Tempering Sampler with Diffusion

arXiv.org Machine Learning

Recent research has focused on designing neural samplers that amortize the process of sampling from unnormalized densities. However, despite significant advancements, they still fall short of the state-of-the-art MCMC approach, Parallel Tempering (PT), when it comes to the efficiency of target evaluations. On the other hand, unlike a well-trained neural sampler, PT yields only dependent samples and needs to be rerun -- at considerable computational cost -- whenever new samples are required. To address these weaknesses, we propose the Progressive Tempering Sampler with Diffusion (PTSD), which trains diffusion models sequentially across temperatures, leveraging the advantages of PT to improve the training of neural samplers. We also introduce a novel method to combine high-temperature diffusion models to generate approximate lower-temperature samples, which are minimally refined using MCMC and used to train the next diffusion model. PTSD enables efficient reuse of sample information across temperature levels while generating well-mixed, uncorrelated samples. Our method significantly improves target evaluation efficiency, outperforming diffusion-based neural samplers.


Position: There Is No Free Bayesian Uncertainty Quantification

arXiv.org Machine Learning

Due to their intuitive appeal, Bayesian methods of modeling and uncertainty quantification have become popular in modern machine and deep learning. When providing a prior distribution over the parameter space, it is straightforward to obtain a distribution over the parameters that is conventionally interpreted as uncertainty quantification of the model. We challenge the validity of such Bayesian uncertainty quantification by discussing the equivalent optimization-based representation of Bayesian updating, provide an alternative interpretation that is coherent with the optimization-based perspective, propose measures of the quality of the Bayesian inferential stage, and suggest directions for future work.


Normalizing Flows are Capable Models for RL

arXiv.org Artificial Intelligence

Modern reinforcement learning (RL) algorithms have found success by using powerful probabilistic models, such as transformers, energy-based models, and diffusion/flow-based models. To this end, RL researchers often choose to pay the price of accommodating these models into their algorithms -- diffusion models are expressive, but are computationally intensive due to their reliance on solving differential equations, while autoregressive transformer models are scalable but typically require learning discrete representations. Normalizing flows (NFs), by contrast, seem to provide an appealing alternative, as they enable likelihoods and sampling without solving differential equations or autoregressive architectures. However, their potential in RL has received limited attention, partly due to the prevailing belief that normalizing flows lack sufficient expressivity. We show that this is not the case. Building on recent work in NFs, we propose a single NF architecture which integrates seamlessly into RL algorithms, serving as a policy, Q-function, and occupancy measure. Our approach leads to much simpler algorithms, and achieves higher performance in imitation learning, offline, goal conditioned RL and unsupervised RL.


Missing Data in Signal Processing and Machine Learning: Models, Methods and Modern Approaches

arXiv.org Machine Learning

Missing data appears when parts of the data are not available for a given variable or a given observation. It is an ubiquitous problem in a wide range of scientific disciplines, including sensor networks, geophysical data analysis, radar and image processing, remote sensing, ecological statistics and biomedical studies, just to name a few [1]-[5]. Signal processing is no exception to the rule, where missing data mainly come from sensor malfunction, hidden or impossible measurements, human errors and natural hazards, all of which can hinder a thorough understanding, analysis, and interpretation of the signal. One of the earliest work on missing data was published in 1932 by Wilks, who mentioned the need to extract as much information as possible from fragmentary answers of questionnaires in social sciences and government statistics. Therefore, it is not surprising that the first discipline to witness this issue was mathematical statistics. This led Wilks to derive efficient estimators for the parameters of a normal bivariate distribution when the data contain missing values [6]. This work was extended to the multivariate case by Lord in 1955 [7]. Since the early 1970's, the literature in missing data has flourished with the development of computational capacity, leading to major developments in signal processing and its related fields, such as statistical inference [2], data analysis [8] and machine learning [9]. In particular, the formulation of a missing-data theory framework by Rubin in [10], which describes the relation between missingness and data values in the so-called missing-data mechanisms, has allowed tremendous advancements in statistical analysis. Therefore, a tutorial paper aiming to summarize the existing and novel strategies in the SP & ML literature addressing various problems related to missing data, such as parameter estimation, matrix completion, missing data imputation and learning with missing values, as well as showing their potential applications, is an urgent desideratum. This tutorial aims to provide practitioners with vital tools, in an accessible way, to answer the question: How to deal with missing data? There are many strategies to handle incomplete signals.


Discovery of Probabilistic Dirichlet-to-Neumann Maps on Graphs

arXiv.org Machine Learning

Dirichlet-to-Neumann maps enable the coupling of multiphysics simulations across computational subdomains by ensuring continuity of state variables and fluxes at artificial interfaces. We present a novel method for learning Dirichlet-to-Neumann maps on graphs using Gaussian processes, specifically for problems where the data obey a conservation constraint from an underlying partial differential equation. Our approach combines discrete exterior calculus and nonlinear optimal recovery to infer relationships between vertex and edge values. This framework yields data-driven predictions with uncertainty quantification across the entire graph, even when observations are limited to a subset of vertices and edges. By optimizing over the reproducing kernel Hilbert space norm while applying a maximum likelihood estimation penalty on kernel complexity, our method ensures that the resulting surrogate strictly enforces conservation laws without overfitting. We demonstrate our method on two representative applications: subsurface fracture networks and arterial blood flow. Our results show that the method maintains high accuracy and well-calibrated uncertainty estimates even under severe data scarcity, highlighting its potential for scientific applications where limited data and reliable uncertainty quantification are critical.


Online Bayesian system identification in multivariate autoregressive models via message passing

arXiv.org Machine Learning

In multivariate autoregressive models with exogenous inputs (MARX), the evolution of the signal incorporates past observations and controls, producing substantial uncertainty during parameter estimation. Bayesian inference procedures can quantify this uncertainty and propagate it towards future predictions [5], [6]. Quantified uncertainty is valuable on its own, but also useful to sensor fusion, optimal experimental design and adaptive control [7], [8], [9], [10], [11]. We present an exact recursive Bayesian estimator whose computation is distributed over a probabilistic graphical model. Bayesian inference in multivariate autoregressive models has a rich history, especially in econometrics [1], [3].


Tensor State Space-based Dynamic Multilayer Network Modeling

arXiv.org Machine Learning

Understanding the complex interactions within dynamic multilayer networks is critical for advancements in various scientific domains. Existing models often fail to capture such networks' temporal and cross-layer dynamics. This paper introduces a novel Tensor State Space Model for Dynamic Multilayer Networks (TSSDMN), utilizing a latent space model framework. TSSDMN employs a symmetric Tucker decomposition to represent latent node features, their interaction patterns, and layer transitions. Then by fixing the latent features and allowing the interaction patterns to evolve over time, TSSDMN uniquely captures both the temporal dynamics within layers and across different layers. The model identifiability conditions are discussed. By treating latent features as variables whose posterior distributions are approximated using a mean-field variational inference approach, a variational Expectation Maximization algorithm is developed for efficient model inference. Numerical simulations and case studies demonstrate the efficacy of TSSDMN for understanding dynamic multilayer networks.


A Gibbs Sampler for Efficient Bayesian Inference in Sign-Identified SVARs

arXiv.org Machine Learning

We develop a new algorithm for inference based on structural vector autoregressions (SVARs) identified with sign restrictions. The key insight of our algorithm is to break apart from the accept-reject tradition associated with sign-identified SVARs. We show that embedding an elliptical slice sampling within a Gibbs sampler approach can deliver dramatic gains in speed and turn previously infeasible applications into feasible ones. We provide a tractable example to illustrate the power of the elliptical slice sampling applied to sign-identified SVARs. We demonstrate the usefulness of our algorithm by applying it to a well-known small-SVAR model of the oil market featuring a tight identified set, as well as to a large SVAR model with more than 100 sign restrictions.


JojoSCL: Shrinkage Contrastive Learning for single-cell RNA sequence Clustering

arXiv.org Machine Learning

Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular processes by enabling gene expression analysis at the individual cell level. Clustering allows for the identification of cell types and the further discovery of intrinsic patterns in single-cell data. However, the high dimensionality and sparsity of scRNA-seq data continue to challenge existing clustering models. In this paper, we introduce JojoSCL, a novel self-supervised contrastive learning framework for scRNA-seq clustering. By incorporating a shrinkage estimator based on hierarchical Bayesian estimation, which adjusts gene expression estimates towards more reliable cluster centroids to reduce intra-cluster dispersion, and optimized using Stein's Unbiased Risk Estimate (SURE), JojoSCL refines both instance-level and cluster-level contrastive learning. Experiments on ten scRNA-seq datasets substantiate that JojoSCL consistently outperforms prevalent clustering methods, with further validation of its practicality through robustness analysis and ablation studies. JojoSCL's code is available at: https://github.com/ziwenwang28/JojoSCL.


Hierarchical Bayesian Knowledge Tracing in Undergraduate Engineering Education

arXiv.org Machine Learning

Educators teaching entry-level university engineering modules face the challenge of identifying which topics students find most difficult and how to support diverse student needs effectively. This study demonstrates a rigorous yet interpretable statistical approach -- hierarchical Bayesian modeling -- that leverages detailed student response data to quantify both skill difficulty and individual student abilities. Using a large-scale dataset from an undergraduate Statics course, we identified clear patterns of skill mastery and uncovered distinct student subgroups based on their learning trajectories. Our analysis reveals that certain concepts consistently present challenges, requiring targeted instructional support, while others are readily mastered and may benefit from enrichment activities. Importantly, the hierarchical Bayesian method provides educators with intuitive, reliable metrics without sacrificing predictive accuracy. This approach allows for data-informed decisions, enabling personalized teaching strategies to improve student engagement and success. By combining robust statistical methods with clear interpretability, this study equips educators with actionable insights to better support diverse learner populations.