markovian score climbing
Markovian Score Climbing: Variational Inference with KL(p
Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions q and then finds the member of that family that is closest to the exact posterior p. Traditionally, VI algorithms minimize the "exclusive Kullback-Leibler (KL)" KL(q||p), often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL(p||q), which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL using stochastic gradients with vanishing bias. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL. It does not suffer from the systematic errors inherent in existing methods, such as Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo, which lead to bias in their final estimates. We illustrate convergence on a toy model and demonstrate the utility of MSC on Bayesian probit regression for classification as well as a stochastic volatility model for financial data.
Markovian Score Climbing: Variational Inference with KL(p
Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions q and then finds the member of that family that is closest to the exact posterior p. Traditionally, VI algorithms minimize the "exclusive Kullback-Leibler (KL)" KL(q p), often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL(p q), which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL using stochastic gradients with vanishing bias. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL.
Review for NeurIPS paper: Markovian Score Climbing: Variational Inference with KL(p
Relation to Prior Work: Prior work is discussed but some important related work is missing. I listed some related work which in my opinion should be discussed below. Wang and colleges [1] develop a meta-learning approach to learn Gibbs block conditionals. The paper has a different focus and assumes to have access to samples from the true generative model but is still technically related. They optimize an inclusive KL-divergence and employ additional MH steps to maintain the correct stationary distribution.
Review for NeurIPS paper: Markovian Score Climbing: Variational Inference with KL(p
This work proposes a novel variation for VI, based on a combination of MCMC/SMC and stochastic gradients. The key idea is using a conditional Markov transition kernel to obtain increasingly refined estimates of the KL gradients. The empirical results are provided on smaller datasets and it has been pointed out that the paper would improve, if scalability of the method could have been illustrated via experiments on larger datasets.
Markovian Score Climbing: Variational Inference with KL(p
Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions q and then finds the member of that family that is closest to the exact posterior p. Traditionally, VI algorithms minimize the "exclusive Kullback-Leibler (KL)" KL(q p), often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL(p q), which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL using stochastic gradients with vanishing bias. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL.
Markovian Score Climbing: Variational Inference with KL(p||q)
Naesseth, Christian A., Lindsten, Fredrik, Blei, David
Modern variational inference (VI) uses stochastic gradients to avoid intractable expectations, enabling large-scale probabilistic inference in complex models. VI posits a family of approximating distributions $q$ and then finds the member of that family that is closest to the exact posterior $p$. Traditionally, VI algorithms minimize the "exclusive KL" KL$(q\|p)$, often for computational convenience. Recent research, however, has also focused on the "inclusive KL" KL$(p\|q)$, which has good statistical properties that makes it more appropriate for certain inference problems. This paper develops a simple algorithm for reliably minimizing the inclusive KL. Consider a valid MCMC method, a Markov chain whose stationary distribution is $p$. The algorithm we develop iteratively samples the chain $z[k]$, and then uses those samples to follow the score function of the variational approximation, $\nabla \log q(z[k])$ with a Robbins-Monro step-size schedule. This method, which we call Markovian score climbing (MSC), converges to a local optimum of the inclusive KL. It does not suffer from the systematic errors inherent in existing methods, such as Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo, which lead to bias in their final estimates. In a variant that ties the variational approximation directly to the Markov chain, MSC further provides a new algorithm that melds VI and MCMC. We illustrate convergence on a toy model and demonstrate the utility of MSC on Bayesian probit regression for classification as well as a stochastic volatility model for financial data.