Building a Functional Machine Translation Corpus for Kpelle
Yamoah, Kweku Andoh, Weako, Jackson, Dorley, Emmanuel J.
–arXiv.org Artificial Intelligence
In this paper, we introduce the first publicly available English-Kpelle dataset for machine translation, comprising over 2000 sentence pairs drawn from everyday communication, religious texts, and educational materials. By fine-tuning Meta's No Language Left Behind(NLLB) model on two versions of the dataset, we achieved BLEU scores of up to 30 in the Kpelle-to-English direction, demonstrating the benefits of data augmentation. Our findings align with NLLB-200 benchmarks on other African languages, underscoring Kpelle's potential for competitive performance despite its low-resource status. Beyond machine translation, this dataset enables broader NLP tasks, including speech recognition and language modelling. We conclude with a roadmap for future dataset expansion, emphasizing orthographic consistency, community-driven validation, and interdisciplinary collaboration to advance inclusive language technology development for Kpelle and other low-resourced Mande languages.
arXiv.org Artificial Intelligence
May-27-2025
- Country:
- Africa
- Liberia (0.05)
- Namibia (0.04)
- Niger (0.04)
- South Africa (0.04)
- West Africa (0.04)
- Europe
- Belgium (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Michigan (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Louisiana > Orleans Parish
- Canada > Ontario
- Oceania > Australia (0.14)
- Africa
- Genre:
- Instructional Material (0.66)
- Research Report (0.70)
- Technology: