Goto

Collaborating Authors

 stein variational newton method


A Stein variational Newton method

Neural Information Processing Systems

Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm: it minimizes the Kullback-Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space [Liu & Wang, NIPS 2016]. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases.


Reviews: A Stein variational Newton method

Neural Information Processing Systems

Summary: SVGD iteratively moves a set of particles toward the target by choosing a perturbative direction to maximumly decrease the KL divergence with the target distribution in RKHS. The paper proposes to add second-order information into SVGD updates, preliminary empirical results show that their method converges faster in few cases. The paper is well written, and the proofs seem correct. An important reason in using second-order information is the hope to achieve a faster convergence rate. My major concern is a lack of theoretical analysis of convergence rate in this paper: 1) An appealing property of SVGD is that the optimal decreasing rate equals to Stein discrepancy D_F(q p), where F is a function set that includes all possible velocity fields.


A Stein variational Newton method

Detommaso, Gianluca, Cui, Tiangang, Marzouk, Youssef, Spantini, Alessio, Scheichl, Robert

Neural Information Processing Systems

Stein variational gradient descent (SVGD) was recently proposed as a general purpose nonparametric variational inference algorithm: it minimizes the Kullback–Leibler divergence between the target distribution and its approximation by implementing a form of functional gradient descent on a reproducing kernel Hilbert space [Liu & Wang, NIPS 2016]. In this paper, we accelerate and generalize the SVGD algorithm by including second-order information, thereby approximating a Newton-like iteration in function space. We also show how second-order information can lead to more effective choices of kernel. We observe significant computational gains over the original SVGD algorithm in multiple test cases. Papers published at the Neural Information Processing Systems Conference.