Topic Discovery through Data Dependent and Random Projections
Ding, Weicong, Rohban, Mohammad H., Ishwar, Prakash, Saligrama, Venkatesh
We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms based on data-dependent and random projections of word-frequency patterns to identify novel words and associated topics. We will also discuss the statistical guarantees of the data-dependent projections method based on two mild assumptions on the prior density of topic document matrix. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and real-world datasets to demonstrate qualitative and quantitative merits of our scheme.
Mar-18-2013
- Country:
- Asia > Middle East (0.67)
- Europe (0.67)
- North America > United States
- California (0.28)
- Genre:
- Research Report (1.00)
- Industry:
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Law (0.92)
- Energy (0.67)
- Law Enforcement & Public Safety (0.67)
- Government > Regional Government
- Technology: