Goto

Collaborating Authors

 marginal likelihood estimate





We have simplified Figure 3 considerably, removing STL (which uses biased gradients), and removing 2 row C

Neural Information Processing Systems

We would like to thank the reviewers for their kind and thoughtful comments. Any attempt to mitigate particle degeneracy (e.g. Replicating Fig 1BD, we find similar, albeit less extreme results, with TMC always being faster than SMC. In particular, we have included Eq. 36 in the main text, and also included the corresponding choice of This should help to clarify that Eq. 11 applies to any directed graphical model (we have also included references In the example in Figure 1, we consider a model that does not have a chain-structure (see Appendix Figure 1A). IW AE performs arbitrarily badly due to the high-dimensionality of the state-space.



Asynchronous Anytime Sequential Monte Carlo

Brooks Paige, Frank Wood, Arnaud Doucet, Yee Whye Teh

Neural Information Processing Systems

We introduce a new sequential Monte Carlo algorithm we call the particle cascade. The particle cascade is an asynchronous, anytime alternative to traditional sequential Monte Carlo algorithms that is amenable to parallel and distributed implementations. It uses no barrier synchronizations which leads to improved particle throughput and memory efficiency. It is an anytime algorithm in the sense that it can be run forever to emit an unbounded number of particles while keeping within a fixed memory budget. We prove that the particle cascade provides an unbiased marginal likelihood estimator which can be straightforwardly plugged into existing pseudo-marginal methods.


Asynchronous Anytime Sequential Monte Carlo

Neural Information Processing Systems

We introduce a new sequential Monte Carlo algorithm we call the particle cascade. The particle cascade is an asynchronous, anytime alternative to traditional sequential Monte Carlo algorithms that is amenable to parallel and distributed implementations. It uses no barrier synchronizations which leads to improved particle throughput and memory efficiency. It is an anytime algorithm in the sense that it can be run forever to emit an unbounded number of particles while keeping within a fixed memory budget. We prove that the particle cascade provides an unbiased marginal likelihood estimator which can be straightforwardly plugged into existing pseudo-marginal methods.


Stable Training of Normalizing Flows for High-dimensional Variational Inference

Andrade, Daniel

arXiv.org Machine Learning

Variational inference with normalizing flows (NFs) is an increasingly popular alternative to MCMC methods. In particular, NFs based on coupling layers (Real NVPs) are frequently used due to their good empirical performance. In theory, increasing the depth of normalizing flows should lead to more accurate posterior approximations. However, in practice, training deep normalizing flows for approximating high-dimensional posterior distributions is often infeasible due to the high variance of the stochastic gradients. In this work, we show that previous methods for stabilizing the variance of stochastic gradient descent can be insufficient to achieve stable training of Real NVPs. As the source of the problem, we identify that, during training, samples often exhibit unusual high values. As a remedy, we propose a combination of two methods: (1) soft-thresholding of the scale in Real NVPs, and (2) a bijective soft log transformation of the samples. We evaluate these and other previously proposed modification on several challenging target distributions, including a high-dimensional horseshoe logistic regression model. Our experiments show that with our modifications, stable training of Real NVPs for posteriors with several thousand dimensions is possible, allowing for more accurate marginal likelihood estimation via importance sampling. Moreover, we evaluate several common training techniques and architecture choices and provide practical advise for training NFs for high-dimensional variational inference.


Auto-Encoding Sequential Monte Carlo

Le, Tuan Anh, Igl, Maximilian, Jin, Tom, Rainforth, Tom, Wood, Frank

arXiv.org Machine Learning

Probabilistic machine learning [Ghahramani, 2015] allows us to model the structure and dependencies of latent variables and observational data as a joint probability distribution. Once a model is defined, we can perform inference to update our prior beliefs about latent variables in light of observed data to obtain the posterior distribution. The posterior can be used to answer any questions we might have about the latent quantities while coherently accounting for our uncertainty about the world. We introduce a method for simultaneous model learning and inference amortization [Gershman and Goodman, 2014], given an unlabeled dataset of observations. The model is specified partially, the rest being specified using a generative network whose weights are to be learned. Inference amortization refers to spending additional time before inference to obtain an amortization artifact which is used to speed up inference during test time.


Early Stopping is Nonparametric Variational Inference

Maclaurin, Dougal, Duvenaud, David, Adams, Ryan P.

arXiv.org Machine Learning

We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution. This distribution is implicitly defined as the transformation of an initial distribution by a sequence of optimization updates. By tracking the change in entropy over this sequence of transformations during optimization, we form a scalable, unbiased estimate of the variational lower bound on the log marginal likelihood. We can use this bound to optimize hyperparameters instead of using cross-validation. This Bayesian interpretation of SGD suggests improved, overfitting-resistant optimization procedures, and gives a theoretical foundation for popular tricks such as early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.