Distributed Inference for Latent Dirichlet Allocation
–Neural Information Processing Systems
We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or "topic" model – using distributed compu- of the total data set. We pro- tation, where each of pose two distributed inference schemes that are motivated from different perspec- tives. The first scheme uses local Gibbs sampling on each processor with periodic updates--it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme re- lies on a hierarchical Bayesian extension of the standard LDA model to directly processors--it has a theo- account for the fact that data are distributed across retical guarantee of convergence but is more complex to implement than the ap- proximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning.
Neural Information Processing Systems
Apr-6-2023, 14:37:34 GMT