Goto

Collaborating Authors

 Bayesian Learning


Inference-Time Scaling of Diffusion Language Models with Particle Gibbs Sampling

arXiv.org Artificial Intelligence

Discrete diffusion models have recently emerged as strong alternatives to autoregressive language models, matching their performance through large-scale training. However, inference-time control remains relatively underexplored. In this work, we study how to steer generation toward desired rewards without retraining the models. Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement. We introduce particle Gibbs sampling for diffusion language models (PG-DLM), a novel inference-time algorithm enabling trajectory-level refinement while preserving generation perplexity under reward optimization. PG-DLM constructs a Markov chain over full denoising trajectories and applies a conditional sequential Monte Carlo kernel to resample them. We derive theoretical guarantees for convergence, including asymptotic consistency and variance bounds. Within this framework, we further analyze trade-offs across four key axes for inference-time scaling under fixed budgets: iterations, samples, denoising steps, and reward estimation. Our analysis shows scaling iterations achieves the best reward-perplexity trade-off. Empirically, PG-DLM consistently outperforms prior methods using MDLM and LLaDA-8B as base models across a wide range of compute budgets for reward-guided generation tasks including toxicity and sentiment control as well as linguistic acceptability.


A fast non-reversible sampler for Bayesian finite mixture models

arXiv.org Machine Learning

Finite mixtures are a cornerstone of Bayesian modelling, and it is well-known that sampling from the resulting posterior distribution can be a hard task. In particular, popular reversible Markov chain Monte Carlo schemes are often slow to converge when the number of observations $n$ is large. In this paper we introduce a novel and simple non-reversible sampling scheme for Bayesian finite mixture models, which is shown to drastically outperform classical samplers in many scenarios of interest, especially during convergence phase and when components in the mixture have non-negligible overlap. At the theoretical level, we show that the performance of the proposed non-reversible scheme cannot be worse than the standard one, in terms of asymptotic variance, by more than a factor of four; and we provide a scaling limit analysis suggesting that the non-reversible sampler can reduce the convergence time from O$(n^2)$ to O$(n)$. We also discuss why the statistical features of mixture models make them an ideal case for the use of non-reversible discrete samplers.


Total Robustness in Bayesian Nonlinear Regression for Measurement Error Problems under Model Misspecification

arXiv.org Machine Learning

Modern regression analyses are often undermined by covariate measurement error, misspecification of the regression model, and misspecification of the measurement error distribution. We present, to the best of our knowledge, the first Bayesian nonparametric framework targeting total robustness that tackles all three challenges in general nonlinear regression. The framework assigns a Dirichlet process prior to the latent co-variate-response distribution and updates it with posterior pseudo-samples of the latent covariates, thereby providing the Dirichlet process posterior with observation-informed latent inputs and yielding estimators that minimise the discrepancy between Dirichlet process realisations and the model-induced joint law. This design allows practitioners to (i) encode prior beliefs, (ii) choose between pseudo-sampling latent covariates or working directly with error-prone observations, and (iii) tune the influence of prior and data. We establish generalisation bounds that tighten whenever the prior or pseudo-sample generator aligns with the underlying data generating process, ensuring robustness without sacrificing consistency. A gradient-based algorithm enables efficient computations; simulations and two real-world studies show lower estimation error and reduced estimation sensitivity to misspecification compared to Bayesian and frequentist competitors. The framework, therefore, offers a practical and interpretable paradigm for trustworthy regression when data and models are jointly imperfect.


Oracle-based Uniform Sampling from Convex Bodies

arXiv.org Machine Learning

We propose new Markov chain Monte Carlo algorithms to sample a uniform distribution on a convex body $K$. Our algorithms are based on the Alternating Sampling Framework/proximal sampler, which uses Gibbs sampling on an augmented distribution and assumes access to the so-called restricted Gaussian oracle (RGO). The key contribution of this work is the efficient implementation of RGO for uniform sampling on $K$ via rejection sampling and access to either a projection oracle or a separation oracle on $K$. In both oracle cases, we establish non-asymptotic complexities to obtain unbiased samples where the accuracy is measured in Rรฉnyi divergence or $ฯ‡^2$-divergence.


Injecting Measurement Information Yields a Fast and Noise-Robust Diffusion-Based Inverse Problem Solver

arXiv.org Artificial Intelligence

Diffusion models have been firmly established as principled zero-shot solvers for linear and nonlinear inverse problems, owing to their powerful image prior and iterative sampling algorithm. These approaches often rely on Tweedie's formula, which relates the diffusion variate $\mathbf{x}_t$ to the posterior mean $\mathbb{E} [\mathbf{x}_0 | \mathbf{x}_t]$, in order to guide the diffusion trajectory with an estimate of the final denoised sample $\mathbf{x}_0$. However, this does not consider information from the measurement $\mathbf{y}$, which must then be integrated downstream. In this work, we propose to estimate the conditional posterior mean $\mathbb{E} [\mathbf{x}_0 | \mathbf{x}_t, \mathbf{y}]$, which can be formulated as the solution to a lightweight, single-parameter maximum likelihood estimation problem. The resulting prediction can be integrated into any standard sampler, resulting in a fast and memory-efficient inverse solver. Our optimizer is amenable to a noise-aware likelihood-based stopping criteria that is robust to measurement noise in $\mathbf{y}$. We demonstrate comparable or improved performance against a wide selection of contemporary inverse solvers across multiple datasets and tasks.


Bayesian E(3)-Equivariant Interatomic Potential with Iterative Restratification of Many-body Message Passing

arXiv.org Artificial Intelligence

Machine learning potentials (MLPs) have become essential for large-scale atomistic simulations, enabling ab initio-level accuracy with computational efficiency. However, current MLPs struggle with uncertainty quantification, limiting their reliability for active learning, calibration, and out-of-distribution (OOD) detection. We address these challenges by developing Bayesian E(3) equivariant MLPs with iterative restratification of many-body message passing. Our approach introduces the joint energy-force negative log-likelihood (NLL$_\text{JEF}$) loss function, which explicitly models uncertainty in both energies and interatomic forces, yielding superior accuracy compared to conventional NLL losses. We systematically benchmark multiple Bayesian approaches, including deep ensembles with mean-variance estimation, stochastic weight averaging Gaussian, improved variational online Newton, and laplace approximation by evaluating their performance on uncertainty prediction, OOD detection, calibration, and active learning tasks. We further demonstrate that NLL$_\text{JEF}$ facilitates efficient active learning by quantifying energy and force uncertainties. Using Bayesian active learning by disagreement (BALD), our framework outperforms random sampling and energy-uncertainty-based sampling. Our results demonstrate that Bayesian MLPs achieve competitive accuracy with state-of-the-art models while enabling uncertainty-guided active learning, OOD detection, and energy/forces calibration. This work establishes Bayesian equivariant neural networks as a powerful framework for developing uncertainty-aware MLPs for atomistic simulations at scale.


Scalable Deep Generative Relational Model with High-Order Node Dependence

Neural Information Processing Systems

We propose a probabilistic framework for modelling and exploring the latent structure of relational data. Given feature information for the nodes in a network, the scalable deep generative relational model (SDREM) builds a deep network architecture that can approximate potential nonlinear mappings between nodes' feature information and the nodes' latent representations. Our contribution is two-fold: (1) We incorporate high-order neighbourhood structure information to generate the latent representations at each node, which vary smoothly over the network.



3948ead63a9f2944218de038d8934305-Reviews.html

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The bottom line of this paper is an efficient algorithm for finding maximum likelihood estimators for elliptically contoured distributions, a class of densities that includes the Gaussian and various generalizations of it. For the Gaussian itself, that optimization is straightforward, it's the generalizations where the new algorithm provides real advantages. One could argue that this focus on a relatively arcane family of distributions (Kotz-type) limits the utility of this paper. But I think it's actually the other way round: The paper may spark new interest at NIPS in these models.