DEUCE: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning
Guo, Jiaxin, Chen, C. L. Philip, Li, Shuzhen, Zhang, Tong
–arXiv.org Artificial Intelligence
Cold-start active learning (CSAL) selects valuable instances from an unlabeled dataset for manual annotation. It provides high-quality data at a low annotation cost for label-scarce text classification. However, existing CSAL methods overlook weak classes and hard representative examples, resulting in biased learning. To address these issues, this paper proposes a novel dual-diversity enhancing and uncertainty-aware (DEUCE) framework for CSAL. Specifically, DEUCE leverages a pretrained language model (PLM) to efficiently extract textual representations, class predictions, and predictive uncertainty. Then, it constructs a Dual-Neighbor Graph (DNG) to combine information on both textual diversity and class diversity, ensuring a balanced data distribution. It further propagates uncertainty information via density-based clustering to select hard representative instances. DEUCE performs well in selecting class-balanced and hard representative data by dual-diversity and informativeness. Experiments on six NLP datasets demonstrate the superiority and efficiency of DEUCE.
arXiv.org Artificial Intelligence
Jan-31-2025
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Rwanda > Kigali
- Kigali (0.04)
- Ethiopia > Addis Ababa
- Asia
- China > Guangdong Province
- Guangzhou (0.04)
- Middle East
- Singapore (0.04)
- China > Guangdong Province
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Middle East > Malta (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- United Kingdom > England
- Greater Manchester > Manchester (0.04)
- Ireland > Leinster
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Dominican Republic (0.04)
- United States
- Colorado > Boulder County
- Boulder (0.04)
- Florida > Volusia County
- Daytona Beach (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York > New York County
- New York City (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Colorado > Boulder County
- Canada
- Africa
- Genre:
- Research Report (1.00)
- Industry:
- Education (1.00)
- Information Technology (0.67)
- Technology: