Latent Tree Models for Hierarchical Topic Detection
Chen, Peixian, Zhang, Nevin L., Liu, Tengfei, Poon, Leonard K. M., Chen, Zhourong, Khawar, Farhan
We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables, with those at the lowest latent level representing word co-occurrence patterns and those at higher levels representing co-occurrence of patterns at the level below. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. Unlike LDA-based topic models, HLTMs do not refer to a document generation process and use word variables instead of token variables. They use a tree structure to model the relationships between topics and words, which is conducive to the discovery of meaningful topics and topic hierarchies.
Dec-21-2016
- Country:
- North America > United States (0.68)
- Asia > China (0.14)
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Leisure & Entertainment (1.00)
- Banking & Finance (1.00)
- Energy > Oil & Gas (0.68)
- Media > Film (0.67)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
- Government > Regional Government