Reviews: Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Neural Information Processing Systems 

Overall, I found the paper interesting; the paper offers new theory as well as numerical results comparable to the state of the art on decently difficult datasets. Perhaps due to space constraints, an important part of the paper (section 3.2) - the inference algorithm - is poorly explained. In particular, I initially thought that the use of particles meant that the approximating distribution was a sum of Dirac delta functions - but that cannot be the case since, even with many particles, the'posterior' would degenerate into the MAP (note that in similar work, authors either use particles when p(x) involves discrete x variables, as in Kulkarni et al, or'smooth' the particles to approximate a continuous distribution, as in Gershman et al). Instead, it looks like the algorithm works directly on samples of the distribution q0, q1.. (hence the vague'for whatever distribution q that {xi}ni 1 currently represents'). It is tempting to consider q_i to be a kernel density estimate (mixture of normals with fixed width), and see if we can approximate equation 9 for that representation to be stable.