Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Dec-31-2009–Neural Information Processing Systems

We present a nonparametric hierarchical Bayesian model of document collections that decouples sparsity and smoothness in the component distributions (i.e., the "topics"). In the sparse topic model (sparseTM), each topic is represented by a bank of selector variables that determine which terms appear in the topic. Thus each topic is associated with a subset of the vocabulary, and topic smoothness is modeled on this subset. We develop an efficient Gibbs sampler for the sparseTM that includes a general-purpose method for sampling from a Dirichlet mixture with a combinatorial number of components. We demonstrate the sparseTM on four real-world datasets. Compared to traditional approaches, the empirical results will show that sparseTMs give better predictive performance with simpler inferred models.

machine learning, natural language, sparsetm, (15 more...)

Neural Information Processing Systems

Dec-31-2009

Conferences PDF

Add feedback

Genre:
- Research Report (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.67)
  - Machine Learning > Learning Graphical Models
    - Directed Networks > Bayesian Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found