Stein Variational Gradient Descent with Matrix-Valued Kernels

Neural Information Processing Systems 

On the other hand, standard SVGD only uses the first order gradient information, and can not leverage the advantage of the second order methods, such as Newton's method and natural gradient, to achieve better performance on challenging problems with complex loss landscapes or domains.