Quantitative Local Convergence of Mean-Field Stein Variational Gradient Flow

Chizat, Lénaïc, Colombo, Maria, Colombo, Roberto, Fernández-Real, Xavier

arXiv.org Machine Learning 

Stein Variational Gradient Descent (SVGD), introduced in [LW16], is a deterministic interactingparticle method for sampling from a target probability measure π e V, only requiring access to V. In the mean-field and continuous-time limit, the distribution of particles converges to a flow (ρt) in the space of probability measures that solves a variant of the Fokker-Planck equation with a velocity field smoothed by weighted convolution with a positive definite kernel [LLN19]. This flow can be interpreted as the gradient flow of the relative entropy H( |π) with respect to a "kernelized" Wasserstein metric [Liu17]. The goal of this paper is to investigate the convergence of (ρt) towards π. To this end, we focus on the model case of Riesz kernels of order s on the d-dimensional torus Td. This is a family of translation-invariant kernels whose Fourier coefficients decay as |ξ| 2s. The parameter s hence directly controls the "smoothing strength" of the interaction; in particular, continuous kernels correspond to s > d/2, C1 kernels to s > (d+1)/2, and C2 kernels to s > (d+2)/2. What is known: qualitative weak convergence The starting point of convergence analyses is the energy dissipation formula [Liu17] d dt H(ρt|π) = Is(ρt|π), (1.1) Authors are listed in alphabetical order.