fine-grained topic cluster
Unsupervised Discovery of Fine-Grained Topic Clusters in Twitter Posts
Markman, Vita (Independent Researcher)
This paper reports on a work in progress whose goal is to use Latent Dirichlet Allocation (LDA) to discover topic clusters within a small set of Twitter posts. Preliminary results indicate that micro-documents are amenable to topic clustering via LDA provided that a) only nouns and verbs are used; b) posts are “padded” with words of similar meaning to those used in the posts. These preliminary findings are consistent with the fact that probabilistic topic models look for word co-occurrences in documents and hence require that topic-indicative words appear together many times throughout the data sample. The results of this pilot study are to be extended to a larger data set of Twitter posts to see whether padding” can counteract the growing size of the data and the presence of numerous information-sparse posts.