MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages
Sonkar, Shashank, Wang, Zichao, Baraniuk, Richard G.
–arXiv.org Artificial Intelligence
This paper investigates the problem of Named Entity Recognition (NER) for extreme low-resource languages with only a few hundred tagged data samples. NER is a fundamental task in Natural Language Processing (NLP). A critical driver accelerating NER systems' progress is the existence of large-scale language corpora that enable NER systems to achieve outstanding performance in languages such as English and French with abundant training data. However, NER for low-resource languages remains relatively unexplored. In this paper, we introduce Mask Augmented Named Entity Recognition (MANER), a new methodology that leverages the distributional hypothesis of pre-trained masked language models (MLMs) for NER. The token in pre-trained MLMs encodes valuable semantic contextual information. MANER re-purposes the token for NER prediction. Specifically, we prepend the token to every word in a sentence for which we would like to predict the named entity tag. During training, we jointly fine-tune the MLM and a new NER prediction head attached to each token. We demonstrate that MANER is well-suited for NER in low-resource languages; our experiments show that for 100 languages with as few as 100 training examples, it improves on state-of-the-art methods by up to 48% and by 12% on average on F1 score. We also perform detailed analyses and ablation studies to understand the scenarios that are best-suited to MANER.
arXiv.org Artificial Intelligence
Dec-19-2022
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America > United States
- Michigan (0.04)
- Europe > Italy
- Oceania > Australia
- Genre:
- Overview (0.86)
- Research Report > Promising Solution (0.34)
- Technology: