AITopics | adelani

Collaborating Authors

adelani

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Where Are We? Evaluating LLM Performance on African Languages

Adebara, Ife, Toyin, Hawau Olamide, Ghebremichael, Nahom Tesfu, Elmadany, AbdelRahim, Abdul-Mageed, Muhammad

arXiv.org Artificial IntelligenceFeb-26-2025

Africa's rich linguistic heritage remains underrepresented in NLP, largely due to historical policies that favor foreign languages and create significant data inequities. In this paper, we integrate theoretical insights on Africa's language landscape with an empirical evaluation using Sahara - a comprehensive benchmark curated from large-scale, publicly accessible datasets capturing the continent's linguistic diversity. By systematically assessing the performance of leading large language models (LLMs) on Sahara, we demonstrate how policy-induced data variations directly impact model effectiveness across African languages. Our findings reveal that while a few languages perform reasonably well, many Indigenous languages remain marginalized due to sparse data. Leveraging these insights, we offer actionable recommendations for policy reforms and inclusive data practices. Overall, our work underscores the urgent need for a dual approach - combining theoretical understanding with empirical evaluation - to foster linguistic diversity in AI for African communities.

african language, computational linguistic, dataset, (14 more...)

arXiv.org Artificial Intelligence

2502.19582

Country:

Asia > Middle East > Israel (0.04)
North America > Dominican Republic (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(30 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Education (1.00)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

Yu, Hao, Alabi, Jesujoba O., Bukula, Andiswa, Zhuang, Jian Yun, Lee, En-Shiun Annie, Guge, Tadesse Kebede, Azime, Israel Abebe, Buzaaba, Happy, Sibanda, Blessing Kudzaishe, Kalipe, Godson K., Mukiibi, Jonathan, Kabenamualu, Salomon Kabongo, Setaka, Mmasibidi, Ndolela, Lolwethu, Odu, Nkiruka, Mabuya, Rooweither, Muhammad, Shamsuddeen Hassan, Osei, Salomey, Samb, Sokhar, Murage, Juliet W., Klakow, Dietrich, Adelani, David Ifeoluwa

arXiv.org Artificial IntelligenceFeb-13-2025

Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks often exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.09814

Country:

Europe > Spain (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(33 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Consumer Products & Services (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

How Good is Your Wikipedia?

Tatariya, Kushal, Kulmizev, Artur, Poelman, Wessel, Ploeger, Esther, Bollmann, Marcel, Bjerva, Johannes, Luo, Jiaming, Lent, Heather, de Lhoneux, Miryam

arXiv.org Artificial IntelligenceNov-8-2024

Wikipedia's perceived high quality and broad language coverage have established it as a fundamental resource in multilingual NLP. In the context of low-resource languages, however, these quality assumptions are increasingly being scrutinised. This paper critically examines the data quality of Wikipedia in a non-English setting by subjecting it to various quality filtering techniques, revealing widespread issues such as a high percentage of one-line articles and duplicate articles. We evaluate the downstream impact of quality filtering on Wikipedia and find that data quality pruning is an effective means for resource-efficient training without hurting performance, especially for low-resource languages. Moreover, we advocate for a shift in perspective from seeking a general definition of data quality towards a more language- and task-specific one. Ultimately, we aim for this study to serve as a guide to using Wikipedia for pretraining in a multilingual setting.

artificial intelligence, ier 1, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.05527

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Singapore (0.04)
(20 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Voices Unheard: NLP Resources and Models for Yor\`ub\'a Regional Dialects

Ahia, Orevaoghene, Aremu, Anuoluwapo, Abagyan, Diana, Gonen, Hila, Adelani, David Ifeoluwa, Abolade, Daud, Smith, Noah A., Tsvetkov, Yulia

arXiv.org Artificial IntelligenceJun-27-2024

Yor\`ub\'a an African language with roughly 47 million speakers encompasses a continuum with several dialects. Recent efforts to develop NLP technologies for African languages have focused on their standard dialects, resulting in disparities for dialects and varieties for which there are little to no resources or tools. We take steps towards bridging this gap by introducing a new high-quality parallel text and speech corpus YOR\`ULECT across three domains and four regional Yor\`ub\'a dialects. To develop this corpus, we engaged native speakers, travelling to communities where these dialects are spoken, to collect text and speech data. Using our newly created corpus, we conducted extensive experiments on (text) machine translation, automatic speech recognition, and speech-to-text translation. Our results reveal substantial performance disparities between standard Yor\`ub\'a and the other dialects across all tasks. However, we also show that with dialect-adaptive finetuning, we are able to narrow this gap. We believe our dataset and experimental analysis will contribute greatly to developing NLP tools for Yor\`ub\'a and its dialects, and potentially for other African languages, by improving our understanding of existing challenges and offering a high-quality dataset for further development. We release YOR\`ULECT dataset and models publicly under an open license.

computational linguistic, dialect, translation, (15 more...)

arXiv.org Artificial Intelligence

2406.19564

Country:

Asia > Singapore (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(29 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

Adelani, David Ifeoluwa, Ojo, Jessica, Azime, Israel Abebe, Zhuang, Jian Yun, Alabi, Jesujoba O., He, Xuanli, Ochieng, Millicent, Hooker, Sara, Bukula, Andiswa, Lee, En-Shiun Annie, Chukwuneke, Chiamaka, Buzaaba, Happy, Sibanda, Blessing, Kalipe, Godson, Mukiibi, Jonathan, Kabongo, Salomon, Yuehgoh, Foutse, Setaka, Mmasibidi, Ndolela, Lolwethu, Odu, Nkiruka, Mabuya, Rooweither, Muhammad, Shamsuddeen Hassan, Osei, Salomey, Samb, Sokhar, Guge, Tadesse Kebede, Stenetorp, Pontus

arXiv.org Artificial IntelligenceJun-5-2024

Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g., African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench--a human-translated benchmark dataset for 16 typologicallydiverse low-resource African languages covering three tasks: natural language inference (AfriXNLI), mathematical reasoning (AfriMGSM), and multi-choice knowledge-based QA (AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings (where test sets are translated into English) across 10 open and four proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages (such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Aya-101 only at 58% of the best-performing proprietary model GPT-4o performance. Machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, like LLaMa 3 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages.

african language, computational linguistic, evaluation, (14 more...)

arXiv.org Artificial Intelligence

2406.03368

Country:

North America > Canada > Ontario > Toronto (0.14)
Africa > Niger (0.05)
Asia > Indonesia > Bali (0.04)
(21 more...)

Genre: Research Report > New Finding (0.65)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Which Nigerian-Pidgin does Generative AI speak?: Issues about Representativeness and Bias for Multilingual and Low Resource Languages

Adelani, David Ifeoluwa, Doğruöz, A. Seza, Shode, Iyanuoluwa, Aremu, Anuoluwapo

arXiv.org Artificial IntelligenceApr-30-2024

Naija is the Nigerian-Pidgin spoken by approx. 120M speakers in Nigeria and it is a mixed language (e.g., English, Portuguese and Indigenous languages). Although it has mainly been a spoken language until recently, there are currently two written genres (BBC and Wikipedia) in Naija. Through statistical analyses and Machine Translation experiments, we prove that these two genres do not represent each other (i.e., there are linguistic differences in word order and vocabulary) and Generative AI operates only based on Naija written in the BBC genre. In other words, Naija written in Wikipedia genre is not represented in Generative AI.

bbc genre, naija, wikipedia genre, (10 more...)

arXiv.org Artificial Intelligence

2404.19442

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Africa > West Africa (0.04)
Africa > Nigeria > Plateau State > Jos (0.04)
(12 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.81)

Add feedback

YORC: Yoruba Reading Comprehension dataset

Aremu, Anuoluwapo, Alabi, Jesujoba O., Adelani, David Ifeoluwa

arXiv.org Artificial IntelligenceSep-14-2023

In this paper, we create YORC: a new multi-choice Yoruba Reading Comprehension dataset that is based on Yoruba high-school reading comprehension examination. We provide baseline results by performing cross-lingual transfer using existing English RACE dataset based on a pre-trained encoder-only model. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.

adelani, computational linguistic, dataset, (16 more...)

arXiv.org Artificial Intelligence

2308.09768

Country:

Asia > Middle East > Israel (0.05)
Africa > Nigeria (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(9 more...)

Genre: Research Report (0.40)

Industry:

Education > Assessment & Standards > Student Performance (0.84)
Education > Educational Setting > K-12 Education > Secondary School (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

NollySenti: Leveraging Transfer Learning and Machine Translation for Nigerian Movie Sentiment Classification

Shode, Iyanuoluwa, Adelani, David Ifeoluwa, Peng, Jing, Feldman, Anna

arXiv.org Artificial IntelligenceAug-22-2023

Africa has over 2000 indigenous languages but they are under-represented in NLP research due to lack of datasets. In recent years, there have been progress in developing labeled corpora for African languages. However, they are often available in a single domain and may not generalize to other domains. In this paper, we focus on the task of sentiment classification for cross domain adaptation. We create a new dataset, NollySenti - based on the Nollywood movie reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo, Nigerian-Pidgin, and Yoruba. We provide an extensive empirical evaluation using classical machine learning methods and pre-trained language models. Leveraging transfer learning, we compare the performance of cross-domain adaptation from Twitter domain, and cross-lingual adaptation from English language. Our evaluation shows that transfer from English in the same target domain leads to more than 5% improvement in accuracy compared to transfer from Twitter in the same language. To further mitigate the domain difference, we leverage machine translation (MT) from English to other Nigerian languages, which leads to a further improvement of 7% over cross-lingual evaluation. While MT to low-resource languages are often of low quality, through human evaluation, we show that most of the translated sentences preserve the sentiment of the original English reviews.

artificial intelligence, machine translation, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.10971

Country:

Africa > Nigeria (0.25)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Niger (0.05)
(32 more...)

Genre: Research Report (0.82)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Ogundepo, Odunayo, Gwadabe, Tajuddeen R., Rivera, Clara E., Clark, Jonathan H., Ruder, Sebastian, Adelani, David Ifeoluwa, Dossou, Bonaventure F. P., DIOP, Abdou Aziz, Sikasote, Claytone, Hacheme, Gilles, Buzaaba, Happy, Ezeani, Ignatius, Mabuya, Rooweither, Osei, Salomey, Emezue, Chris, Kahira, Albert Njoroge, Muhammad, Shamsuddeen H., Oladipo, Akintunde, Owodunni, Abraham Toluwase, Tonja, Atnafu Lambebo, Shode, Iyanuoluwa, Asai, Akari, Ajayi, Tunde Oluwaseyi, Siro, Clemencia, Arthur, Steven, Adeyemi, Mofetoluwa, Ahia, Orevaoghene, Aremu, Anuoluwapo, Awosan, Oyinkansola, Chukwuneke, Chiamaka, Opoku, Bernard, Ayodele, Awokoya, Otiende, Verrah, Mwase, Christine, Sinkala, Boyd, Rubungo, Andre Niyongabo, Ajisafe, Daniel A., Onwuegbuzia, Emeka Felix, Mbow, Habib, Niyomutabazi, Emile, Mukonde, Eunice, Lawan, Falalu Ibrahim, Ahmad, Ibrahim Said, Alabi, Jesujoba O., Namukombo, Martin, Chinedu, Mbonu, Phiri, Mofya, Putini, Neo, Mngoma, Ndumiso, Amuok, Priscilla A., Iro, Ruqayya Nasir, Adhiambo, Sonia

arXiv.org Artificial IntelligenceMay-11-2023

African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.

artificial intelligence, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2305.06897

Country:

North America > United States (0.28)
Africa > Niger (0.05)
Asia > Malaysia (0.04)
(20 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

New voices in AI: David Adelani

AIHubJan-26-2022, 14:30:55 GMT

Welcome to the first episode of New voices in AI! You can find David on Twitter @davlanade and find out more about Masakhane here. The music used is'Wholesome' by Kevin MacLeod, Licensed under Creative Commons Daly: Hello and welcome to new voices in AI, this a new series from AIhub where we celebrate the voices PhD students, early career researchers, and those with a new perspective on AI. And without further ado, let's begin. First up, a big welcome to our very first guest on "New voices in AI" and if you could introduce yourself, who are you? Adelani: Thank you very much for having me. So, Masakhane is this grassroots organization, whose mission is to strengthen and spur NLP research in African languages, by Africans for Africans, so, and currently the organization we are majorly operating on Slack we already have over 1000 Members. Of course, not everyone is active but we have more than 100 or close to 100 active members as well, yeah. So how did, how did you get into AI?

adelani, african language, daly, (14 more...)

AIHub

Country:

Africa > Nigeria (0.05)
North America > United States (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)
(5 more...)

Genre: Personal > Interview (0.67)

Technology:

Information Technology > Communications > Social Media (0.87)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.30)

Add feedback