Syntactic Topic Models

Boyd-graber, Jordan L., Blei, David M.

Neural Information Processing Systems 

We develop ame\ (STM), a nonparametric Bayesian model of parsed documents. Each word of a sentence is generated by a distribution that combines document-specific topic weights and parse-tree specific syntactic transitions. Words are assumed generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and hand-parsed documents. Papers published at the Neural Information Processing Systems Conference.