Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling
Khalifa, Muhammad, Abdul-Mageed, Muhammad, Shaalan, Khaled
–arXiv.org Artificial Intelligence
A sufficient amount of annotated data is required to fine-tune pre-trained language models for downstream tasks. Unfortunately, attaining labeled data can be costly, especially for multiple language varieties/dialects. We propose to self-train pre-trained language models in zero- and few-shot scenarios to improve the performance on data-scarce dialects using only resources from data-rich ones. We demonstrate the utility of our approach in the context of Arabic sequence labeling by using a language model fine-tuned on Modern Standard Arabic (MSA) only to predict named entities (NE) and part-of-speech (POS) tags on several dialectal Arabic (DA) varieties. We show that self-training is indeed powerful, improving zero-shot MSA-to-DA transfer by as large as \texttildelow 10\% F$_1$ (NER) and 2\% accuracy (POS tagging). We acquire even better performance in few-shot scenarios with limited labeled data. We conduct an ablation experiment and show that the performance boost observed directly results from the unlabeled DA examples for self-training and opens up opportunities for developing DA models exploiting only MSA resources. Our approach can also be extended to other languages and tasks.
arXiv.org Artificial Intelligence
Jan-14-2021
- Country:
- North America > Canada
- British Columbia (0.04)
- Europe
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- United Kingdom > England
- Asia > Middle East
- UAE > Dubai Emirate > Dubai (0.04)
- Africa > Middle East
- Egypt > Cairo Governorate > Cairo (0.04)
- North America > Canada
- Genre:
- Research Report (0.82)
- Technology: