Adaptive Stepsizing for Stochastic Gradient Langevin Dynamics in Bayesian Neural Networks
Rajpal, Rajit, Leimkuhler, Benedict, Jiang, Yuanhao
Bayesian Neural Networks (BNNs) provide a framework for quantifying uncertainty in deep learning models by placing a posterior distribution over the weights, p(θ|D). Algorithms like Stochastic Gradient Langevin Dynamics (SGLD) extend classical MCMC to the big-data setting by leveraging stochastic gradients. However, the loss landscape of deep neural networks is notoriously complex, characterized by pathological curvature and saddle points Kim et al. [2020]. Several methods have introduced adaptive step sizes or preconditioning to improve the convergence of SGMCMC on challenging loss landscapes, including geometry-based schemes such as SGRLD and SGRHMC Patterson and Teh [2013], Ma et al. [2015], and practical variants like pSGLD Li et al. [2015]. However, as discussed in Ma et al. [2015], Rensmeyer and Niggemann [2024] and Section 2.2, these methods are biased unless the dynamics is augmented by a computationally expensive divergence term. Adaptive stepsizes can be viewed as an isotropic but dynamic preconditioning framework Leroy et al. [2024]. Building on the recent formulation of Leimkuhler et al. [2025], we revisit adaptive step size methods for SGLD in Bayesian sampling by introducing SA-SGLD. Importantly this scheme circumvents the computation of the divergence by use of statistical reweighting. We provide theoretical foundations and show using small examples and a Bayesian neural network that this method can improve performance compared to SGLD.
Nov-19-2025