MANER: Mask Augmented Named Entity Recognition for Extreme Low-Resource Languages

Sonkar, Shashank, Wang, Zichao, Baraniuk, Richard G.

Dec-19-2022–arXiv.org Artificial Intelligence

This paper investigates the problem of Named Entity Recognition (NER) for extreme low-resource languages with only a few hundred tagged data samples. NER is a fundamental task in Natural Language Processing (NLP). A critical driver accelerating NER systems' progress is the existence of large-scale language corpora that enable NER systems to achieve outstanding performance in languages such as English and French with abundant training data. However, NER for low-resource languages remains relatively unexplored. In this paper, we introduce Mask Augmented Named Entity Recognition (MANER), a new methodology that leverages the distributional hypothesis of pre-trained masked language models (MLMs) for NER. The token in pre-trained MLMs encodes valuable semantic contextual information. MANER re-purposes the token for NER prediction. Specifically, we prepend the token to every word in a sentence for which we would like to predict the named entity tag. During training, we jointly fine-tune the MLM and a new NER prediction head attached to each token. We demonstrate that MANER is well-suited for NER in low-resource languages; our experiments show that for 100 languages with as few as 100 training examples, it improves on state-of-the-art methods by up to 48% and by 12% on average on F1 score. We also perform detailed analyses and ablation studies to understand the scenarios that are best-suited to MANER.

artificial intelligence, maner, natural language, (13 more...)

arXiv.org Artificial Intelligence

Dec-19-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America > United States
  - Michigan (0.04)
- Europe > Italy
  - Tuscany > Florence (0.04)

Genre:
- Overview (0.86)
- Research Report > Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found