Semisupervised Text Classification Using Unsupervised Topic Information
Dorado, Rubén (École de Technologie Supérieure, Université du Québec) | Ratté, Sylvie
Labeling corpora is a time consuming and recurring problem while developing practical NLP applications. In this paper, we present a semi-supervised method to build a text classifier using unsupervised topic information. The objective is to use the least amount of labeled data to accelerate the creation of corpus for classification in specific domains. We show that it is possible to obtain a performance similar to state-of-the-art methods, despite the limited quantity of data.Labeling corpora is a time consuming and recurring problem while developing practical NLP applications. In this paper, we present a semi-supervised method to build a text classifier using unsupervised topic information. The objective is to use the least amount of labeled data to accelerate the creation of corpus for specific classification process. We show that it is possible to obtain a performance similar to state-of-the-art methods, despite the limited quantity of data.
May-8-2016
- Technology: