LEE, HOLDEN, Risteski, Andrej, Ge, Rong

A key task in Bayesian machine learning is sampling from distributions that are only specified up to a partition function (i.e., constant of proportionality). One prevalent example of this is sampling posteriors in parametric distributions, such as latent-variable generative models. However sampling (even very approximately) can be #P-hard. Classical results (going back to Bakry and Emery) on sampling focus on log-concave distributions, and show a natural Markov chain called Langevin diffusion mix in polynomial time. However, all log-concave distributions are uni-modal, while in practice it is very common for the distribution of interest to have multiple modes. In this case, Langevin diffusion suffers from torpid mixing. We address this problem by combining Langevin diffusion with simulated tempering. The result is a Markov chain that mixes more rapidly by transitioning between different temperatures of the distribution. We analyze this Markov chain for a mixture of (strongly) log-concave distributions of the same shape. In particular, our technique applies to the canonical multi-modal distribution: a mixture of gaussians (of equal variance). Our algorithm efficiently samples from these distributions given only access to the gradient of the log-pdf. To the best of our knowledge, this is the first result that proves fast mixing for multimodal distributions.

LEE, HOLDEN, Risteski, Andrej, Ge, Rong

van de Meent, Jan-Willem, Paige, Brooks, Wood, Frank

In this paper we demonstrate that tempering Markov chain Monte Carlo samplers for Bayesian models by recursively subsampling observations without replacement can improve the performance of baseline samplers in terms of effective sample size per computation. We present two tempering by subsampling algorithms, subsampled parallel tempering and subsampled tempered transitions. We provide an asymptotic analysis of the computational cost of tempering by subsampling, verify that tempering by subsampling costs less than traditional tempering, and demonstrate both algorithms on Bayesian approaches to learning the mean of a high dimensional multivariate Normal and estimating Gaussian process hyperparameters.

Monte Carlo (MC) sampling methods are widely applied in Bayesian inference, system simulation and optimization problems. The Markov Chain Monte Carlo (MCMC) algorithms are a well-known class of MC methods which generate a Markov chain with the desired invariant distribution. In this document, we focus on the Metropolis-Hastings (MH) sampler, which can be considered as the atom of the MCMC techniques, introducing the basic notions and different properties. We describe in details all the elements involved in the MH algorithm and the most relevant variants. Several improvements and recent extensions proposed in the literature are also briefly discussed, providing a quick but exhaustive overview of the current Metropolis-based sampling's world.

Holden, Matthew, Pereyra, Marcelo, Zygalakis, Konstantinos C.

This paper proposes a new methodology for performing Bayesian inference in imaging inverse problems where the prior knowledge is available in the form of training data. Following the manifold hypothesis and adopting a generative modelling approach, we construct a data-driven prior that is supported on a sub-manifold of the ambient space, which we can learn from the training data by using a variational autoencoder or a generative adversarial network. We establish the existence and well-posedness of the associated posterior distribution and posterior moments under easily verifiable conditions, providing a rigorous underpinning for Bayesian estimators and uncertainty quantification analyses. Bayesian computation is performed by using a parallel tempered version of the preconditioned Crank-Nicolson algorithm on the manifold, which is shown to be ergodic and robust to the non-convex nature of these data-driven models. In addition to point estimators and uncertainty quantification analyses, we derive a model misspecification test to automatically detect situations where the data-driven prior is unreliable, and explain how to identify the dimension of the latent space directly from the training data. The proposed approach is illustrated with a range of experiments with the MNIST dataset, where it outperforms alternative image reconstruction approaches from the state of the art. A model accuracy analysis suggests that the Bayesian probabilities reported by the data-driven models are also remarkably accurate under a frequentist definition of probability.