Boosting Prompt-Based Self-Training With Mapping-Free Automatic Verbalizer for Multi-Class Classification

Kho, Yookyung, Kim, Jaehee, Kang, Pilsung

Dec-8-2023–arXiv.org Artificial Intelligence

Recently, prompt-based fine-tuning has garnered considerable interest as a core technique for few-shot text classification task. This approach reformulates the fine-tuning objective to align with the Masked Language Modeling (MLM) objective. Leveraging unlabeled data, prompt-based self-training has shown greater effectiveness in binary and three-class classification. However, prompt-based self-training for multi-class classification has not been adequately investigated, despite its significant applicability to real-world scenarios. Moreover, extending current methods to multi-class classification suffers from the verbalizer that extracts the predicted value of manually pre-defined single label word for each class from MLM predictions. Consequently, we introduce a novel, efficient verbalizer structure, named Mapping-free Automatic Verbalizer (MAV). Comprising two fully connected layers, MAV serves as a trainable verbalizer that automatically extracts the requisite word features for classification by capitalizing on all available information from MLM predictions. Experimental results on five multi-class classification datasets indicate MAV's superior self-training efficacy.

classification, computational linguistic, label word, (14 more...)

arXiv.org Artificial Intelligence

Dec-8-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - New York > New York County
      - New York City (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
- Europe
  - Kosovo (0.04)
  - Austria (0.04)
  - Spain (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Singapore (0.04)
  - China > Hong Kong (0.04)
  - South Korea > Seoul
    - Seoul (0.04)
  - India > Telangana
    - Hyderabad (0.04)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Leisure & Entertainment > Sports (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Classification (0.49)
  - Machine Learning
    - Statistical Learning (0.46)
    - Unsupervised or Indirectly Supervised Learning (0.35)