Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings

Gao, Lingyu

arXiv.org Artificial Intelligence 

Text classification, a classic task in natural language processing (NLP), involves assigning predefined categories to textual data and is crucial for applications ranging from sentiment analysis to spam detection. This thesis advances text classification by harnessing the intrinsic knowledge of Pretrained Language Models (PLMs) to address three challenging scenarios: distractor selection for multiple-choice cloze questions, improving robustness for prompt-based zero-shot text classification, and demonstration selection for retrieval-based in-context learning. Firstly, we focus on selecting distractors for multiple-choice cloze questions, ensuring that they are misleading yet incorrect. We assess the relationship between human experts' annotations (accept/reject) and various features, including context-free features (e.g., word frequency) and context-sensitive features (e.g., conditional probabilities of fillin-the-blank words). We utilize pretrained embeddings and follow annotation instructions for context-free feature design, and we find that using contextualized word representations from PLMs as features drastically improves performance over traditional feature-based models, even rivaling human performance (Chapter 3).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found