Concept Labeling: Building Text Classifiers with Minimal Supervision
Chenthamarakshan, Vijil (IBM T J Watson Research Center Yorktown Heights) | Melville, Prem (IBM T J Watson Research Center Yorktown Heights) | Sindhwani, Vikas (IBM T J Watson Research Center Yorktown Heights) | Lawrence, Richard D (IBM T J Watson Research Center Yorktown Heights)
The rapid construction of supervised text classification models is becoming a pervasive need across many modern applications. To reduce human-labeling bottlenecks, many new statistical paradigms (e.g., active, semi-supervised, transfer and multi-task learning) have been vigorously pursued in recent literature with varying degrees of empirical success. Concurrently, the emergence of Web 2.0 platforms in the last decade has enabled a world-wide, collaborative human effort to construct a massive ontology of concepts with very rich, detailed and accurate descriptions. In this paper we propose a new framework to extract supervisory information from such ontologies and complement it with a shift in human effort from direct labeling of examples in the domain of interest to the much more efficient identification of concept-class associations. Through empirical studies on text categorization problems using the Wikipedia ontology, we show that this shift allows very high-quality models to be immediately induced at virtually no cost.
Jul-19-2011
- Country:
- Asia (0.04)
- Africa (0.04)
- North America > United States
- Wisconsin > Dane County
- Madison (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Wisconsin > Dane County
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Health & Medicine > Therapeutic Area (0.47)
- Technology: