Consensus Monte Carlo for Random Subsets using Shared Anchors

Ni, Yang, Ji, Yuan, Mueller, Peter

arXiv.org Machine Learning 

We develop a consensus Monte Carlo (CMC) algorithm for Bayesian nonparametric (BNP) inference with large datasets that are too big for full posterior simulation on a single machine, due to CPU or memory limitations. The proposed algorithm is for inference under BNP models for random subsets, including clustering, feature allocation (FA), and related models. We distribute a large dataset to multiple machines, run separate instances of Markov chain Monte Carlo (MCMC) simulations in parallel and then aggregate the Monte Carlo samples across machines. The idea of the proposed CMC hinges on choosing a portion of observations as anchor points (Kunkel and Peruggia, 2018) which are distributed to every machine along with other observations that are only available to one machine. Those anchor points then serve as anchors to merge Monte Carlo draws of clusters or features across machines.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found