A note on the unique properties of the Kullback--Leibler divergence for sampling via gradient flows
–arXiv.org Artificial Intelligence
Sampling from a target probability distribution whose density is known up to a normalisation constant is a fundamental task in computational statistics and machine learning. A natural way to formulate this task is optimisation of a functional measuring the dissimilarity to the target probability distribution. Following this point of view, one can derive many popular sampling frameworks including variational inference [Blei et al., 2017], algorithms based on diffusions [Roberts and Tweedie, 1996, Durmus et al., 2019] and deterministic flows [Liu, 2017], and algorithms based on importance sampling [Chopin et al., 2024, Crucinio and Pathiraja, 2025]. The connection between minimisation of a divergence and Monte Carlo algorithms is established through gradient flows over the space of probability measures (see, e.g., Chewi et al. [2025], Carrillo et al. [2024] for a recent review); with different metrics over this space leading to different differential equations whose discretisations correspond to many popular Monte Carlo algorithms. The most widely used divergence is the reverse Kullback-Leibler (KL) divergence whose gradient flow w.r.t. the Wasserstein-2 metric can be implemented by a Langevin diffusion [Jordan et al., 1998] and easily discretised in time, resulting in the Unadjusted Langevin algorithm [Roberts and Tweedie, 1996].
arXiv.org Artificial Intelligence
Jul-8-2025