AITopics | Yousuf, Oreen

Collaborating Authors

Yousuf, Oreen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MasakhaNEWS: News Topic Classification for African languages

Adelani, David Ifeoluwa, Masiak, Marek, Azime, Israel Abebe, Alabi, Jesujoba, Tonja, Atnafu Lambebo, Mwase, Christine, Ogundepo, Odunayo, Dossou, Bonaventure F. P., Oladipo, Akintunde, Nixdorf, Doreen, Emezue, Chris Chinenye, al-azzawi, sana, Sibanda, Blessing, David, Davis, Ndolela, Lolwethu, Mukiibi, Jonathan, Ajayi, Tunde, Moteu, Tatiana, Odhiambo, Brian, Owodunni, Abraham, Obiefuna, Nnaemeka, Mohamed, Muhidin, Muhammad, Shamsuddeen Hassan, Ababu, Teshome Mulugeta, Salahudeen, Saheed Abdullahi, Yigezu, Mesay Gemeda, Gwadabe, Tajuddeen, Abdulmumin, Idris, Taye, Mahlet, Awoyomi, Oluwabusayo, Shode, Iyanuoluwa, Adelani, Tolulope, Abdulganiyu, Habiba, Omotayo, Abdul-Hakeem, Adeeko, Adetola, Afolabi, Abeeb, Aremu, Anuoluwapo, Samuel, Olanrewaju, Siro, Clemencia, Kimotho, Wangari, Ogbu, Onyekachi, Mbonu, Chinedu, Chukwuneke, Chiamaka, Fanijo, Samuel, Ojo, Jessica, Awosan, Oyinkansola, Kebede, Tadesse, Sakayo, Toadoum Sari, Nyatsine, Pamela, Sidume, Freedmore, Yousuf, Oreen, Oduwole, Mardiyyah, Tshinu, Tshinu, Kimanuka, Ussen, Diko, Thina, Nxakama, Siyanda, Nigusse, Sinodos, Johar, Abdulmejid, Mohamed, Shafie, Hassan, Fuad Mire, Mehamed, Moges Ahmed, Ngabire, Evrard, Jules, Jules, Ssenkungu, Ivan, Stenetorp, Pontus

arXiv.org Artificial IntelligenceSep-20-2023

African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.

large language model, machine learning, natural language, (5 more...)

arXiv.org Artificial Intelligence

2304.09972

Country: Africa (0.24)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Add feedback

Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages

Azime, Israel Abebe, Al-Azzawi, Sana Sabah, Tonja, Atnafu Lambebo, Shode, Iyanuoluwa, Alabi, Jesujoba, Awokoya, Ayodele, Oduwole, Mardiyyah, Adewumi, Tosin, Fanijo, Samuel, Awosan, Oyinkansola, Yousuf, Oreen

arXiv.org Artificial IntelligenceApr-13-2023

AfriSenti-SemEval Shared Task 12 of SemEval-2023. The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B), and zero-shot sentiment classification (task C). For sub-task A, we conducted experiments using classical machine learning classifiers, Afro-centric language models, and language-specific models. For task B, we fine-tuned multilingual pre-trained language models that support many of the languages in the task. For task C, we used we make use of a parameter-efficient Adapter approach that leverages monolingual texts in the target language for effective zero-shot transfer. Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages. We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.

adapter, artificial intelligence, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.06459

Country: North America > United States (0.49)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Adapting to the Low-Resource Double-Bind: Investigating Low-Compute Methods on Low-Resource African Languages

Leong, Colin, Shandilya, Herumb, Dossou, Bonaventure F. P., Tonja, Atnafu Lambebo, Mathew, Joel, Omotayo, Abdul-Hakeem, Yousuf, Oreen, Akinjobi, Zainab, Emezue, Chris Chinenye, Muhammad, Shamsudeen, Kolawole, Steven, Choi, Younwoo, Adewumi, Tosin

arXiv.org Artificial IntelligenceMar-29-2023

Many natural language processing (NLP) tasks make use of massively pre-trained language models, which are computationally expensive. However, access to high computational resources added to the issue of data scarcity of African languages constitutes a real barrier to research experiments on these languages. In this work, we explore the applicability of low-compute approaches such as language adapters in the context of this low-resource double-bind. We intend to answer the following question: do language adapters allow those who are doubly bound by data and compute to practically build useful models? Through fine-tuning experiments on African languages, we evaluate their effectiveness as cost-effective approaches to low-resource African NLP. Using solely free compute resources, our results show that language adapters achieve comparable performances to massive pre-trained language models which are heavy on computational resources. This opens the door to further experimentation and exploration on full-extent of language adapters capacities.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Artificial Intelligence

2303.16985

Country: North America > United States (0.94)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)

Add feedback

The African Stopwords project: curating stopwords for African languages

Emezue, Chris, Nigatu, Hellina, Thinwa, Cynthia, Zhou, Helper, Muhammad, Shamsuddeen, Louis, Lerato, Abdulmumin, Idris, Oyerinde, Samuel, Ajibade, Benjamin, Samuel, Olanrewaju, Joshua, Oviawe, Onwuegbuzia, Emeka, Emezue, Handel, Ige, Ifeoluwatayo A., Tonja, Atnafu Lambebo, Chukwuneke, Chiamaka, Dossou, Bonaventure F. P., Etori, Naome A., Emmanuel, Mbonu Chinedu, Yousuf, Oreen, Aina, Kaosarat, David, Davis

arXiv.org Artificial IntelligenceMar-21-2023

Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and available for use in NLP packages. Stopwords in the context of African languages are understudied and can reveal information about the crossover between languages. The African Stopwords project aims to study and curate stopwords for African languages. When analysing text data and building various NLP models, stopwords might not add much value to the meaning of the document (Singh, 2019) depending on the NLP task (like text classification).

artificial intelligence, information retrieval, natural language, (14 more...)

arXiv.org Artificial Intelligence

2304.12155

Country:

North America > United States (0.30)
Europe (0.30)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)

Add feedback

AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages

Dossou, Bonaventure F. P., Tonja, Atnafu Lambebo, Yousuf, Oreen, Osei, Salomey, Oppong, Abigail, Shode, Iyanuoluwa, Awoyomi, Oluwabusayo Olufunke, Emezue, Chris Chinenye

arXiv.org Artificial IntelligenceNov-23-2022

In recent years, multilingual pre-trained language models have gained prominence due to their remarkable performance on numerous downstream Natural Language Processing tasks (NLP). However, pre-training these large multilingual language models requires a lot of training data, which is not available for African Languages. Active learning is a semi-supervised learning algorithm, in which a model consistently and dynamically learns to identify the most beneficial samples to train itself on, in order to achieve better optimization and performance on downstream tasks. Furthermore, active learning effectively and practically addresses real-world data scarcity. Despite all its benefits, active learning, in the context of NLP and especially multilingual language models pretraining, has received little consideration. In this paper, we present AfroLM, a multilingual language model pretrained from scratch on 23 African languages (the largest effort to date) using our novel self-active learning framework. Pretrained on a dataset significantly (14x) smaller than existing baselines, AfroLM outperforms many multilingual pretrained language models (AfriBERTa, XLMR-base, mBERT) on various NLP downstream tasks (NER, text classification, and sentiment analysis). Additional out-of-domain sentiment analysis experiments show that \textbf{AfroLM} is able to generalize well across various domains. We release the code source, and our datasets used in our framework at https://github.com/bonaventuredossou/MLM_AL.

alphabet, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2211.03263

Country:

Europe (1.00)
Africa > Tanzania (0.28)
North America > Canada (0.28)
Africa > Republic of the Congo (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.45)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.45)

Add feedback

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Adelani, David Ifeoluwa, Neubig, Graham, Ruder, Sebastian, Rijhwani, Shruti, Beukman, Michael, Palen-Michel, Chester, Lignos, Constantine, Alabi, Jesujoba O., Muhammad, Shamsuddeen H., Nabende, Peter, Dione, Cheikh M. Bamba, Bukula, Andiswa, Mabuya, Rooweither, Dossou, Bonaventure F. P., Sibanda, Blessing, Buzaaba, Happy, Mukiibi, Jonathan, Kalipe, Godson, Mbaye, Derguene, Taylor, Amelia, Kabore, Fatoumata, Emezue, Chris Chinenye, Aremu, Anuoluwapo, Ogayo, Perez, Gitau, Catherine, Munkoh-Buabeng, Edwin, Koagne, Victoire M., Tapo, Allahsera Auguste, Macucwa, Tebogo, Marivate, Vukosi, Mboning, Elvis, Gwadabe, Tajuddeen, Adewumi, Tosin, Ahia, Orevaoghene, Nakatumba-Nabende, Joyce, Mokono, Neo L., Ezeani, Ignatius, Chukwuneke, Chiamaka, Adeyemi, Mofetoluwa, Hacheme, Gilles Q., Abdulmumin, Idris, Ogundepo, Odunayo, Yousuf, Oreen, Ngoli, Tatiana Moteu, Klakow, Dietrich

arXiv.org Artificial IntelligenceNov-15-2022

African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.

computational linguistic, information retrieval, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.12391

Country:

Europe (1.00)
Asia (1.00)
Africa (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages

Abdulmumin, Idris, Beukman, Michael, Alabi, Jesujoba O., Emezue, Chris, Asiko, Everlyn, Adewumi, Tosin, Muhammad, Shamsuddeen Hassan, Adeyemi, Mofetoluwa, Yousuf, Oreen, Singh, Sahib, Gwadabe, Tajuddeen Rabiu

arXiv.org Artificial IntelligenceOct-20-2022

We participated in the WMT 2022 Large-Scale Machine Translation Evaluation for the African Languages Shared Task. This work describes our approach, which is based on filtering the given noisy data using a sentence-pair classifier that was built by fine-tuning a pre-trained language model. To train the classifier, we obtain positive samples (i.e. high-quality parallel sentences) from a gold-standard curated dataset and extract negative samples (i.e. low-quality parallel sentences) from automatically aligned parallel data by choosing sentences with low alignment scores. Our final machine translation model was then trained on filtered data, instead of the entire noisy dataset. We empirically validate our approach by evaluating on two common datasets and show that data filtering generally improves overall translation quality, in some cases even significantly.

artificial intelligence, natural language, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2210.10692

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback