NaijaNLP: A Survey of Nigerian Low-Resource Languages
–arXiv.org Artificial Intelligence
With over 500 languages in Nigeria, three languages -- Hausa, Yor\`ub\'a and Igbo -- spoken by over 175 million people, account for about 60% of the spoken languages. However, these languages are categorised as low-resource due to insufficient resources to support tasks in computational linguistics. Several research efforts and initiatives have been presented, however, a coherent understanding of the state of Natural Language Processing (NLP) - from grammatical formalisation to linguistic resources that support complex tasks such as language understanding and generation is lacking. This study presents the first comprehensive review of advancements in low-resource NLP (LR-NLP) research across the three major Nigerian languages (NaijaNLP). We quantitatively assess the available linguistic resources and identify key challenges. Although a growing body of literature addresses various NLP downstream tasks in Hausa, Igbo, and Yor\`ub\'a, only about 25.1% of the reviewed studies contribute new linguistic resources. This finding highlights a persistent reliance on repurposing existing data rather than generating novel, high-quality resources. Additionally, language-specific challenges, such as the accurate representation of diacritics, remain under-explored. To advance NaijaNLP and LR-NLP more broadly, we emphasise the need for intensified efforts in resource enrichment, comprehensive annotation, and the development of open collaborative initiatives.
arXiv.org Artificial Intelligence
Mar-6-2025
- Country:
- Africa
- Togo (0.04)
- Niger (0.46)
- Sudan (0.04)
- Kenya (0.04)
- South Africa (0.04)
- Nigeria
- Jigawa State > Dutse (0.05)
- Kwara State (0.04)
- Niger State > Minna (0.04)
- Ogun State > Abeokuta (0.04)
- Oyo State > Ibadan (0.04)
- Central African Republic (0.14)
- Ghana (0.04)
- Namibia (0.04)
- Central Africa (0.04)
- Senegal (0.04)
- Cameroon (0.45)
- Benin (0.04)
- Chad (0.27)
- Côte d'Ivoire (0.04)
- Asia
- Indonesia > Bali (0.04)
- Middle East > Israel (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Atlantic Ocean > Black Sea (0.04)
- Europe
- Czechia > South Moravian Region
- Brno (0.04)
- Spain (0.04)
- Sweden > Norrbotten County
- Luleå (0.04)
- United Kingdom > England
- West Yorkshire > Huddersfield (0.04)
- Czechia > South Moravian Region
- North America
- Canada > Ontario
- National Capital Region > Ottawa (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- Indiana (0.04)
- New Jersey (0.04)
- New York > Erie County
- Buffalo (0.04)
- Canada > Ontario
- South America > Chile
- Africa
- Genre:
- Overview (1.00)
- Research Report (1.00)
- Industry:
- Education
- Curriculum (0.45)
- Educational Setting (0.45)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.46)
- Media > News (0.46)
- Education
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (0.68)
- Grammars & Parsing (1.00)
- Large Language Model (0.68)
- Machine Translation (1.00)
- Text Processing (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence