Review for NeurIPS paper: SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence

Neural Information Processing Systems 

Summary and Contributions: The paper makes the following contributions: 1) Interpretation (up to a constant factor of 2) of SVGD as (kernelized) gradient flow of the Chi-squared divergence, called as CSF 2) Establishing exponential ergodicity of CSF (continuous case) with respect to the KL metric and Chi-squared divergence metric, under certain Poincare condition (or LSI) on the target. Indeed this is an issue with any kernel method (from SVM to MMD to SVGD) and it has been addressed in various ways. If one were critical, there is still no "nice" way to pick a kernel. Indeed as mentioned in Line 16 and 17, a single integral operator depending on target \pi is good (in a way it is also along expected lines - for example in MMD context something similar leads to optimality properties). However I tend to not agree 100% with lines 27-28 that "solving high-dimensional PDEs is precisely the target of intensive research in modern numerical PDE" which is my main concern with the practical applicability of the proposed work. There is no "concrete" progress in this direction to the best of the reviewer's knowledge despite several ad-hoc approaches recently.