Goto

Collaborating Authors

 chi-squared divergence





SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence

Neural Information Processing Systems

Stein Variational Gradient Descent (SVGD), a popular sampling algorithm, is often described as the kernelized gradient flow for the Kullback-Leibler divergence in the geometry of optimal transport. We introduce a new perspective on SVGD that instead views SVGD as the kernelized gradient flow of the chi-squared divergence. Motivated by this perspective, we provide a convergence analysis of the chi-squared gradient flow. We also show that our new perspective provides better guidelines for choosing effective kernels for SVGD.






Review for NeurIPS paper: SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence

Neural Information Processing Systems

Summary and Contributions: The paper makes the following contributions: 1) Interpretation (up to a constant factor of 2) of SVGD as (kernelized) gradient flow of the Chi-squared divergence, called as CSF 2) Establishing exponential ergodicity of CSF (continuous case) with respect to the KL metric and Chi-squared divergence metric, under certain Poincare condition (or LSI) on the target. Indeed this is an issue with any kernel method (from SVM to MMD to SVGD) and it has been addressed in various ways. If one were critical, there is still no "nice" way to pick a kernel. Indeed as mentioned in Line 16 and 17, a single integral operator depending on target \pi is good (in a way it is also along expected lines - for example in MMD context something similar leads to optimality properties). However I tend to not agree 100% with lines 27-28 that "solving high-dimensional PDEs is precisely the target of intensive research in modern numerical PDE" which is my main concern with the practical applicability of the proposed work. There is no "concrete" progress in this direction to the best of the reviewer's knowledge despite several ad-hoc approaches recently.


Review for NeurIPS paper: SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence

Neural Information Processing Systems

The reviewers had some concerns about the empirical evaluation and the lack of discrete-time results, but agree that this paper would be a useful addition to NeurIPS. Please see the reviews (and your response) for ways to improve the final manuscript (especially in terms of clarifying the context and scope of this work).