Goto

Collaborating Authors

 cument


The Author-Topic Model for Authors and Documents

arXiv.org Machine Learning

We intro duce the author-topic mo del, a generative mo del for do cuments that extends Latent Dirichlet Allo cation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is asso ciated with a multinomial distribution over topics and each topic is asso ciated with a multinomial distribution over words. A do cument with multiple authors is mo deled as a distribution over topics that is a mixture of the distributions asso ci-ated with the authors. We apply the mo del to a collection of 1,700 NIPS conference pap ers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the p erformance with two other generative mo d-els for do cuments, which are sp ecial cases of the author-topic mo del: LDA (a topic mo del) and a simple author mo del in which each author is asso ciated with a distribution over words rather than a distribution over topics. We show topics recovered by the author-topic mo del, and demonstrate applications to computing similarity b etween authors and entropy of author output.