Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps

Millar, Jeremy R. (Air Force Institute of Technology) | Peterson, Gilbert L. (Air Force Institute of Technology) | Mendenhall, Michael J. (Air Force Institute of Technology)

AAAI Conferences 

Clustering and visualization of large text document collections aids in browsing, navigation, and information retrieval. We present a document clustering and visualization method based on Latent Dirichlet Allocation and self-organizing maps (LDA-SOM). LDA-SOM clusters documents based on topical content and renders clusters in an intuitive two-dimensional format. Document topics are inferred using a probabilistic topic model. Then, due to the topology preserving properties of self-organizing maps, document clusters with similar topic distributions are placed near one another in the visualization. This provides the user an intuitive means of browsing from one cluster to another based on topics held in common. The effectiveness of LDA-SOM is evaluated on the 20 Newsgroups and NIPS data sets.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found