marginal likelihood estimate
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
We have simplified Figure 3 considerably, removing STL (which uses biased gradients), and removing 2 row C
We would like to thank the reviewers for their kind and thoughtful comments. Any attempt to mitigate particle degeneracy (e.g. Replicating Fig 1BD, we find similar, albeit less extreme results, with TMC always being faster than SMC. In particular, we have included Eq. 36 in the main text, and also included the corresponding choice of This should help to clarify that Eq. 11 applies to any directed graphical model (we have also included references In the example in Figure 1, we consider a model that does not have a chain-structure (see Appendix Figure 1A). IW AE performs arbitrarily badly due to the high-dimensionality of the state-space.
Asynchronous Anytime Sequential Monte Carlo
Brooks Paige, Frank Wood, Arnaud Doucet, Yee Whye Teh
We introduce a new sequential Monte Carlo algorithm we call the particle cascade. The particle cascade is an asynchronous, anytime alternative to traditional sequential Monte Carlo algorithms that is amenable to parallel and distributed implementations. It uses no barrier synchronizations which leads to improved particle throughput and memory efficiency. It is an anytime algorithm in the sense that it can be run forever to emit an unbounded number of particles while keeping within a fixed memory budget. We prove that the particle cascade provides an unbiased marginal likelihood estimator which can be straightforwardly plugged into existing pseudo-marginal methods.
- North America > United States (0.28)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
Asynchronous Anytime Sequential Monte Carlo
We introduce a new sequential Monte Carlo algorithm we call the particle cascade. The particle cascade is an asynchronous, anytime alternative to traditional sequential Monte Carlo algorithms that is amenable to parallel and distributed implementations. It uses no barrier synchronizations which leads to improved particle throughput and memory efficiency. It is an anytime algorithm in the sense that it can be run forever to emit an unbounded number of particles while keeping within a fixed memory budget. We prove that the particle cascade provides an unbiased marginal likelihood estimator which can be straightforwardly plugged into existing pseudo-marginal methods.
- North America > United States (0.28)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
Stable Training of Normalizing Flows for High-dimensional Variational Inference
Variational inference with normalizing flows (NFs) is an increasingly popular alternative to MCMC methods. In particular, NFs based on coupling layers (Real NVPs) are frequently used due to their good empirical performance. In theory, increasing the depth of normalizing flows should lead to more accurate posterior approximations. However, in practice, training deep normalizing flows for approximating high-dimensional posterior distributions is often infeasible due to the high variance of the stochastic gradients. In this work, we show that previous methods for stabilizing the variance of stochastic gradient descent can be insufficient to achieve stable training of Real NVPs. As the source of the problem, we identify that, during training, samples often exhibit unusual high values. As a remedy, we propose a combination of two methods: (1) soft-thresholding of the scale in Real NVPs, and (2) a bijective soft log transformation of the samples. We evaluate these and other previously proposed modification on several challenging target distributions, including a high-dimensional horseshoe logistic regression model. Our experiments show that with our modifications, stable training of Real NVPs for posteriors with several thousand dimensions is possible, allowing for more accurate marginal likelihood estimation via importance sampling. Moreover, we evaluate several common training techniques and architecture choices and provide practical advise for training NFs for high-dimensional variational inference.
- North America > United States (0.14)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)
Auto-Encoding Sequential Monte Carlo
Le, Tuan Anh, Igl, Maximilian, Jin, Tom, Rainforth, Tom, Wood, Frank
Probabilistic machine learning [Ghahramani, 2015] allows us to model the structure and dependencies of latent variables and observational data as a joint probability distribution. Once a model is defined, we can perform inference to update our prior beliefs about latent variables in light of observed data to obtain the posterior distribution. The posterior can be used to answer any questions we might have about the latent quantities while coherently accounting for our uncertainty about the world. We introduce a method for simultaneous model learning and inference amortization [Gershman and Goodman, 2014], given an unlabeled dataset of observations. The model is specified partially, the rest being specified using a generative network whose weights are to be learned. Inference amortization refers to spending additional time before inference to obtain an amortization artifact which is used to speed up inference during test time.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Early Stopping is Nonparametric Variational Inference
Maclaurin, Dougal, Duvenaud, David, Adams, Ryan P.
We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution. This distribution is implicitly defined as the transformation of an initial distribution by a sequence of optimization updates. By tracking the change in entropy over this sequence of transformations during optimization, we form a scalable, unbiased estimate of the variational lower bound on the log marginal likelihood. We can use this bound to optimize hyperparameters instead of using cross-validation. This Bayesian interpretation of SGD suggests improved, overfitting-resistant optimization procedures, and gives a theoretical foundation for popular tricks such as early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)
- (2 more...)