Collaborating Authors

Parallelizing MCMC with Random Partition Trees

Neural Information Processing Systems

The modern scale of data has brought new challenges to Bayesian inference. In particular, conventional MCMC algorithms are computationally very expensive for large data sets. A promising approach to solve this problem is embarrassingly parallel MCMC (EP-MCMC), which first partitions the data into multiple subsets and runs independent sampling algorithms on each subset. The subset posterior draws are then aggregated via some combining rules to obtain the final approximation. Existing EP-MCMC algorithms are limited by approximation accuracy and difficulty in resampling. In this article, we propose a new EP-MCMC algorithm PART that solves these problems. The new algorithm applies random partition trees to combine the subset posterior draws, which is distribution-free, easy to resample from and can adapt to multiple scales. We provide theoretical justification and extensive experiments illustrating empirical performance.

Parallelising MCMC via Random Forests Machine Learning

Markov chain Monte Carlo (MCMC) algorithm, a generic sampling method, is ubiquitous in modern statistics, especially in Bayesian fields. MCMC algorithms require only the evaluation of the target pointwise, up to a multiple constant, in order to sample from it. In Bayesian analysis, the object of main interest is the posterior, which is not in closed form in general, and MCMC has become a standard tool in this domain. However, MCMC is difficult to scale and its applications are limited when the observation size is very large, for it needs to sweep over the entire observations set in order to evaluate the likelihood function at each iteration. Recently, many methods have been proposed to better scale MCMC algorithms for big data sets and these can be roughly classified into two groups Bardenet et al. (2017): divide-and-conquer methods and subsampling-based methods. For divide-and-conquer methods, one splits the whole data set into subsets, runs MCMC over each subset to generate samples of parameters and combine these to produce an approximation of the true posterior. Depending on how MCMC is handled over the subsets, these methods can be further classified into two sub-categories.

Efficient Bayesian Inference for a Gaussian Process Density Model Machine Learning

We reconsider a nonparametric density model based on Gaussian processes. By augmenting the model with latent P\'olya--Gamma random variables and a latent marked Poisson process we obtain a new likelihood which is conjugate to the model's Gaussian process prior. The augmented posterior allows for efficient inference by Gibbs sampling and an approximate variational mean field approach. For the latter we utilise sparse GP approximations to tackle the infinite dimensionality of the problem. The performance of both algorithms and comparisons with other density estimators are demonstrated on artificial and real datasets with up to several thousand data points.

Exact slice sampler for Hierarchical Dirichlet Processes Machine Learning

We propose an exact slice sampler for Hierarchical Dirichlet process (HDP) and its associated mixture models (Teh et al., 2006). Although there are existing MCMC algorithms for sampling from the HDP, a slice sampler has been missing from the literature. Slice sampling is well-known for its desirable properties including its fast mixing and its natural potential for parallelization. On the other hand, the hierarchical nature of HDPs poses challenges to adopting a full-fledged slice sampler that automatically truncates all the infinite measures involved without ad-hoc modifications. In this work, we adopt the powerful idea of Bayesian variable augmentation to address this challenge. By introducing new latent variables, we obtain a full factorization of the joint distribution that is suitable for slice sampling. Our algorithm has several appealing features such as (1) fast mixing; (2) remaining exact while allowing natural truncation of the underlying infinite-dimensional measures, as in (Kalli et al., 2011), resulting in updates of only a finite number of necessary atoms and weights in each iteration; and (3) being naturally suited to parallel implementations. The underlying principle for joint factorization of the full likelihood is simple and can be applied to many other settings, such as designing sampling algorithms for general dependent Dirichlet process (DDP) models.

Pseudo-Extended Markov chain Monte Carlo

Neural Information Processing Systems

Sampling from posterior distributions using Markov chain Monte Carlo (MCMC) methods can require an exhaustive number of iterations, particularly when the posterior is multi-modal as the MCMC sampler can become trapped in a local mode for a large number of iterations. In this paper, we introduce the pseudo-extended MCMC method as a simple approach for improving the mixing of the MCMC sampler for multi-modal posterior distributions. On the extended space, the modes of the posterior are connected, which allows the MCMC sampler to easily move between well-separated posterior modes. We demonstrate that the pseudo-extended approach delivers improved MCMC sampling over the Hamiltonian Monte Carlo algorithm on multi-modal posteriors, including Boltzmann machines and models with sparsity-inducing priors. Papers published at the Neural Information Processing Systems Conference.