Kalipe, Godson
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
Adelani, David Ifeoluwa, Ojo, Jessica, Azime, Israel Abebe, Zhuang, Jian Yun, Alabi, Jesujoba O., He, Xuanli, Ochieng, Millicent, Hooker, Sara, Bukula, Andiswa, Lee, En-Shiun Annie, Chukwuneke, Chiamaka, Buzaaba, Happy, Sibanda, Blessing, Kalipe, Godson, Mukiibi, Jonathan, Kabongo, Salomon, Yuehgoh, Foutse, Setaka, Mmasibidi, Ndolela, Lolwethu, Odu, Nkiruka, Mabuya, Rooweither, Muhammad, Shamsuddeen Hassan, Osei, Salomey, Samb, Sokhar, Guge, Tadesse Kebede, Stenetorp, Pontus
Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g., African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench--a human-translated benchmark dataset for 16 typologicallydiverse low-resource African languages covering three tasks: natural language inference (AfriXNLI), mathematical reasoning (AfriMGSM), and multi-choice knowledge-based QA (AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings (where test sets are translated into English) across 10 open and four proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages (such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Aya-101 only at 58% of the best-performing proprietary model GPT-4o performance. Machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, like LLaMa 3 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages.
MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages
Dione, Cheikh M. Bamba, Adelani, David, Nabende, Peter, Alabi, Jesujoba, Sindane, Thapelo, Buzaaba, Happy, Muhammad, Shamsuddeen Hassan, Emezue, Chris Chinenye, Ogayo, Perez, Aremu, Anuoluwapo, Gitau, Catherine, Mbaye, Derguene, Mukiibi, Jonathan, Sibanda, Blessing, Dossou, Bonaventure F. P., Bukula, Andiswa, Mabuya, Rooweither, Tapo, Allahsera Auguste, Munkoh-Buabeng, Edwin, Koagne, victoire Memdjokam, Kabore, Fatoumata Ouoba, Taylor, Amelia, Kalipe, Godson, Macucwa, Tebogo, Marivate, Vukosi, Gwadabe, Tajuddeen, Elvis, Mboning Tchiaze, Onyenwe, Ikechukwu, Atindogbe, Gratien, Adelani, Tolulope, Akinade, Idris, Samuel, Olanrewaju, Nahimana, Marien, Musabeyezu, Thรฉogรจne, Niyomutabazi, Emile, Chimhenga, Ester, Gotosa, Kudzai, Mizha, Patrick, Agbolo, Apelete, Traore, Seydou, Uchechukwu, Chinedu, Yusuf, Aliyu, Abdullahi, Muhammad, Klakow, Dietrich
In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the MasakhaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS tagging in unseen languages.
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
Adelani, David Ifeoluwa, Neubig, Graham, Ruder, Sebastian, Rijhwani, Shruti, Beukman, Michael, Palen-Michel, Chester, Lignos, Constantine, Alabi, Jesujoba O., Muhammad, Shamsuddeen H., Nabende, Peter, Dione, Cheikh M. Bamba, Bukula, Andiswa, Mabuya, Rooweither, Dossou, Bonaventure F. P., Sibanda, Blessing, Buzaaba, Happy, Mukiibi, Jonathan, Kalipe, Godson, Mbaye, Derguene, Taylor, Amelia, Kabore, Fatoumata, Emezue, Chris Chinenye, Aremu, Anuoluwapo, Ogayo, Perez, Gitau, Catherine, Munkoh-Buabeng, Edwin, Koagne, Victoire M., Tapo, Allahsera Auguste, Macucwa, Tebogo, Marivate, Vukosi, Mboning, Elvis, Gwadabe, Tajuddeen, Adewumi, Tosin, Ahia, Orevaoghene, Nakatumba-Nabende, Joyce, Mokono, Neo L., Ezeani, Ignatius, Chukwuneke, Chiamaka, Adeyemi, Mofetoluwa, Hacheme, Gilles Q., Abdulmumin, Idris, Ogundepo, Odunayo, Yousuf, Oreen, Ngoli, Tatiana Moteu, Klakow, Dietrich
African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.