Wasserstein Contraction of Coordinate Ascent Variational Inference
Caprio, Rocco, Corenflos, Adrien, Power, Sam
Finding approximations to an intractable probability distribution π of interest (usually known only up to a normalizing constant) is a key problem in scientific computing. Variational Inference stands out as a particularly attractive tool for this task, owing to its statistical and computational efficiency, and it has been the framework underlying many advances in computational statistics over the past half century (Parisi, 1980; Hinton and Van Camp, 1993; Jordan et al., 1999; Bishop and Nasrabadi, 2006). The central idea is to seek a tractable approximation to π within a chosen family of tractable distributions Q by minimizing a divergence to π over that'variational' family. Often, it is convenient or well-motivated to work with the family of product (or tensor, or factorized) distributions Q = P m, and define optimality through minimisation of the Kullback-Leibler (KL) divergence (also'relative entropy') min KL(ϱ||π): ϱ P m . A key practical aspect of working with this particular loss function is that in solving the associated optimisation problem, one is only required to compute expectations under the tractable variational distribution ϱ, rather than under the intractable target distribution π. In Bayesian statistics, π typically represents the joint posterior distribution of latent variables z Z and some parameters β B given observed data y Y. In these cases, we often choose m = 2 and seek the best variational approximation µ(dz) ν(dβ) to π to solve min KL(µ ν||π): µ P(Z), ν P(B) . The coordinate ascent variational inference algorithm (CAVI, Bishop and Nasrabadi, 2006; Blei et al., 2017) solves this problem by iteratively minimizing the Kullback-Leibler divergence with respect to one element at a time: given a starting point ν0, it iterates µk:= argmin
May-29-2026