Latent Dirichlet Allocation

Blei, David M., Ng, Andrew Y., Jordan, Michael I.

Neural Information Processing Systems 

We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hofmann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found