Does Synthetic Data Help Named Entity Recognition for Low-Resource Languages?
Kamath, Gaurav, Vajjala, Sowmya
–arXiv.org Artificial Intelligence
Named Entity Recognition(NER) for low-resource languages aims to produce robust systems for languages where there is limited labeled training data available, and has been an area of increasing interest within NLP. Data augmentation for increasing the amount of low-resource labeled data is a common practice. In this paper, we explore the role of synthetic data in the context of multilingual, low-resource NER, considering 11 languages from diverse language families. Our results suggest that synthetic data does in fact hold promise for low-resource language NER, though we see significant variation between languages.
arXiv.org Artificial Intelligence
Nov-6-2025
- Country:
- Africa > Niger (0.04)
- Asia
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.05)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East > UAE
- Europe
- Austria (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- North America
- Canada
- Mexico > Mexico City
- Mexico City (0.04)
- United States > Washington
- King County > Seattle (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Government (0.46)
- Technology: