Consensus Monte Carlo for Random Subsets using Shared Anchors
Ni, Yang, Ji, Yuan, Mueller, Peter
We develop a consensus Monte Carlo (CMC) algorithm for Bayesian nonparametric (BNP) inference with large datasets that are too big for full posterior simulation on a single machine, due to CPU or memory limitations. The proposed algorithm is for inference under BNP models for random subsets, including clustering, feature allocation (FA), and related models. We distribute a large dataset to multiple machines, run separate instances of Markov chain Monte Carlo (MCMC) simulations in parallel and then aggregate the Monte Carlo samples across machines. The idea of the proposed CMC hinges on choosing a portion of observations as anchor points (Kunkel and Peruggia, 2018) which are distributed to every machine along with other observations that are only available to one machine. Those anchor points then serve as anchors to merge Monte Carlo draws of clusters or features across machines.
Jun-28-2019
- Country:
- North America > United States > Texas (0.14)
- Genre:
- Research Report (0.82)
- Industry:
- Health & Medicine > Therapeutic Area
- Cardiology/Vascular Diseases (1.00)
- Immunology (1.00)
- Nephrology (0.67)
- Oncology (0.93)
- Health & Medicine > Therapeutic Area