Goto

Collaborating Authors

 underdamped langevin diffusion






On the Ergodicity, Bias and Asymptotic Normality of Randomized Midpoint Sampling Method

Neural Information Processing Systems

The randomized midpoint method, proposed by [ 40 ], has emerged as an optimal discretization procedure for simulating the continuous time underdamped Langevin diffusion. In this paper, we analyze several probabilistic properties of the randomized midpoint discretization method, considering both overdamped and underdamped Langevin dynamics. We first characterize the stationary distribution of the discrete chain obtained with constant step-size discretization and show that it is biased away from the target distribution. Notably, the step-size needs to go to zero to obtain asymptotic unbiasedness. Next, we establish the asymptotic normality of numerical integration using the randomized midpoint method and highlight the relative advantages and disadvantages over other discretizations. Our results collectively provide several insights into the behavior of the randomized midpoint discretization method, including obtaining confidence intervals for numerical integrations.


Underdamped Langevin MCMC with third order convergence

Scott, Maximilian, O'Kane, Dáire, Jelinčič, Andraž, Foster, James

arXiv.org Machine Learning

In this paper, we propose a new numerical method for the underdamped Langevin diffusion (ULD) and present a non-asymptotic analysis of its sampling error in the 2-Wasserstein distance when the $d$-dimensional target distribution $p(x)\propto e^{-f(x)}$ is strongly log-concave and has varying degrees of smoothness. Precisely, under the assumptions that the gradient and Hessian of $f$ are Lipschitz continuous, our algorithm achieves a 2-Wasserstein error of $\varepsilon$ in $\mathcal{O}(\sqrt{d}/\varepsilon)$ and $\mathcal{O}(\sqrt{d}/\sqrt{\varepsilon})$ steps respectively. Therefore, our algorithm has a similar complexity as other popular Langevin MCMC algorithms under matching assumptions. However, if we additionally assume that the third derivative of $f$ is Lipschitz continuous, then our algorithm achieves a 2-Wasserstein error of $\varepsilon$ in $\mathcal{O}(\sqrt{d}/\varepsilon^{\frac{1}{3}})$ steps. To the best of our knowledge, this is the first gradient-only method for ULD with third order convergence. To support our theory, we perform Bayesian logistic regression across a range of real-world datasets, where our algorithm achieves competitive performance compared to an existing underdamped Langevin MCMC algorithm and the popular No U-Turn Sampler (NUTS).




Error estimates between SGD with momentum and underdamped Langevin diffusion

Guillin, Arnaud, Wang, Yu, Xu, Lihu, Yang, Haoran

arXiv.org Machine Learning

Stochastic gradient descent with momentum is a popular variant of stochastic gradient descent, which has recently been reported to have a close relationship with the underdamped Langevin diffusion. In this paper, we establish a quantitative error estimate between them in the 1-Wasserstein and total variation distances.


Kinetic Interacting Particle Langevin Monte Carlo

Oliva, Paul Felix Valsecchi, Akyildiz, O. Deniz

arXiv.org Machine Learning

This paper introduces and analyses interacting underdamped Langevin algorithms, termed Kinetic Interacting Particle Langevin Monte Carlo (KIPLMC) methods, for statistical inference in latent variable models. We propose a diffusion process that evolves jointly in the space of parameters and latent variables and exploit the fact that the stationary distribution of this diffusion concentrates around the maximum marginal likelihood estimate of the parameters. We then provide two explicit discretisations of this diffusion as practical algorithms to estimate parameters of statistical models. For each algorithm, we obtain nonasymptotic rates of convergence for the case where the joint log-likelihood is strongly concave with respect to latent variables and parameters. In particular, we provide convergence analysis for the diffusion together with the discretisation error, providing convergence rate estimates for the algorithms in Wasserstein-2 distance. To demonstrate the utility of the introduced methodology, we provide numerical experiments that demonstrate the effectiveness of the proposed diffusion for statistical inference and the stability of the numerical integrators utilised for discretisation. Our setting covers a broad number of applications, including unsupervised learning, statistical inference, and inverse problems.