Subset Labeled LDA for Large-Scale Multi-Label Classification
Papanikolaou, Yannis, Tsoumakas, Grigorios
Labeled Latent Dirichlet Allocation (LLDA) is an extension of the standard unsupervised Latent Dirichlet Allocation (LDA) algorithm, to address multi-label learning tasks. Previous work has shown it to perform in par with other state-of-the-art multi-label methods. Nonetheless, with increasing label sets sizes LLDA encounters scalability issues. In this work, we introduce Subset LLDA, a simple variant of the standard LLDA algorithm, that not only can effectively scale up to problems with hundreds of thousands of labels but also improves over the LLDA state-of-the-art. We conduct extensive experiments on eight data sets, with label sets sizes ranging from hundreds to hundreds of thousands, comparing our proposed algorithm with the previously proposed LLDA algorithms (Prior--LDA, Dep--LDA), as well as the state of the art in extreme multi-label classification. The results show a steady advantage of our method over the other LLDA algorithms and competitive results compared to the extreme multi-label classification algorithms.
Sep-16-2017
- Country:
- Europe (0.46)
- North America > United States (0.29)
- Genre:
- Research Report > New Finding (0.48)
- Technology: