AITopics | Abdulmumin, Idris

Collaborating Authors

Abdulmumin, Idris

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The African Stopwords project: curating stopwords for African languages

Emezue, Chris, Nigatu, Hellina, Thinwa, Cynthia, Zhou, Helper, Muhammad, Shamsuddeen, Louis, Lerato, Abdulmumin, Idris, Oyerinde, Samuel, Ajibade, Benjamin, Samuel, Olanrewaju, Joshua, Oviawe, Onwuegbuzia, Emeka, Emezue, Handel, Ige, Ifeoluwatayo A., Tonja, Atnafu Lambebo, Chukwuneke, Chiamaka, Dossou, Bonaventure F. P., Etori, Naome A., Emmanuel, Mbonu Chinedu, Yousuf, Oreen, Aina, Kaosarat, David, Davis

arXiv.org Artificial IntelligenceMar-21-2023

Stopwords are fundamental in Natural Language Processing (NLP) techniques for information retrieval. One of the common tasks in preprocessing of text data is the removal of stopwords. Currently, while high-resource languages like English benefit from the availability of several stopwords, low-resource languages, such as those found in the African continent, have none that are standardized and available for use in NLP packages. Stopwords in the context of African languages are understudied and can reveal information about the crossover between languages. The African Stopwords project aims to study and curate stopwords for African languages. When analysing text data and building various NLP models, stopwords might not add much value to the meaning of the document (Singh, 2019) depending on the NLP task (like text classification).

artificial intelligence, information retrieval, natural language, (14 more...)

arXiv.org Artificial Intelligence

2304.12155

Country:

Europe (0.30)
North America > United States > Wisconsin (0.15)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)

Add feedback

HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria

Aliyu, Saminu Mohammad, Wajiga, Gregory Maksha, Murtala, Muhammad, Muhammad, Shamsuddeen Hassan, Abdulmumin, Idris, Ahmad, Ibrahim Said

arXiv.org Artificial IntelligenceNov-28-2022

Social media platforms allow users to freely share their opinions about issues or anything they feel like. However, they also make it easier to spread hate and abusive content. The Fulani ethnic group has been the victim of this unfortunate phenomenon. This paper introduces the HERDPhobia - the first annotated hate speech dataset on Fulani herders in Nigeria - in three languages: English, Nigerian-Pidgin, and Hausa. We present a benchmark experiment using pre-trained languages models to classify the tweets as either hateful or non-hateful. Our experiment shows that the XML-T model provides better performance with 99.83% weighted F1. We released the dataset at https://github.com/hausanlp/HERDPhobia for further research.

artificial intelligence, natural language, tweet, (18 more...)

arXiv.org Artificial Intelligence

2211.15262

Country:

Africa > Nigeria (0.90)
North America > United States > Minnesota (0.29)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Adelani, David Ifeoluwa, Neubig, Graham, Ruder, Sebastian, Rijhwani, Shruti, Beukman, Michael, Palen-Michel, Chester, Lignos, Constantine, Alabi, Jesujoba O., Muhammad, Shamsuddeen H., Nabende, Peter, Dione, Cheikh M. Bamba, Bukula, Andiswa, Mabuya, Rooweither, Dossou, Bonaventure F. P., Sibanda, Blessing, Buzaaba, Happy, Mukiibi, Jonathan, Kalipe, Godson, Mbaye, Derguene, Taylor, Amelia, Kabore, Fatoumata, Emezue, Chris Chinenye, Aremu, Anuoluwapo, Ogayo, Perez, Gitau, Catherine, Munkoh-Buabeng, Edwin, Koagne, Victoire M., Tapo, Allahsera Auguste, Macucwa, Tebogo, Marivate, Vukosi, Mboning, Elvis, Gwadabe, Tajuddeen, Adewumi, Tosin, Ahia, Orevaoghene, Nakatumba-Nabende, Joyce, Mokono, Neo L., Ezeani, Ignatius, Chukwuneke, Chiamaka, Adeyemi, Mofetoluwa, Hacheme, Gilles Q., Abdulmumin, Idris, Ogundepo, Odunayo, Yousuf, Oreen, Ngoli, Tatiana Moteu, Klakow, Dietrich

arXiv.org Artificial IntelligenceNov-15-2022

African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages.

computational linguistic, information retrieval, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.12391

Country:

Europe (1.00)
Asia (1.00)
Africa (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages

Abdulmumin, Idris, Beukman, Michael, Alabi, Jesujoba O., Emezue, Chris, Asiko, Everlyn, Adewumi, Tosin, Muhammad, Shamsuddeen Hassan, Adeyemi, Mofetoluwa, Yousuf, Oreen, Singh, Sahib, Gwadabe, Tajuddeen Rabiu

arXiv.org Artificial IntelligenceOct-20-2022

We participated in the WMT 2022 Large-Scale Machine Translation Evaluation for the African Languages Shared Task. This work describes our approach, which is based on filtering the given noisy data using a sentence-pair classifier that was built by fine-tuning a pre-trained language model. To train the classifier, we obtain positive samples (i.e. high-quality parallel sentences) from a gold-standard curated dataset and extract negative samples (i.e. low-quality parallel sentences) from automatically aligned parallel data by choosing sentences with low alignment scores. Our final machine translation model was then trained on filtered data, instead of the entire noisy dataset. We empirically validate our approach by evaluating on two common datasets and show that data filtering generally improves overall translation quality, in some cases even significantly.

artificial intelligence, natural language, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2210.10692

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis

Muhammad, Shamsuddeen Hassan, Adelani, David Ifeoluwa, Ruder, Sebastian, Ahmad, Ibrahim Said, Abdulmumin, Idris, Bello, Bello Shehu, Choudhury, Monojit, Emezue, Chris Chinenye, Abdullahi, Saheed Salahudeen, Aremu, Anuoluwapo, Jeorge, Alipio, Brazdil, Pavel

arXiv.org Artificial IntelligenceJan-28-2022

Sentiment analysis is one of the most widely studied applications in NLP, but most work focuses on languages with large amounts of data. We introduce the first large-scale human-annotated Twitter sentiment dataset for the four most widely spoken languages in Nigeria (Hausa, Igbo, Nigerian-Pidgin, and Yor\`ub\'a ) consisting of around 30,000 annotated tweets per language (and 14,000 for Nigerian-Pidgin), including a significant fraction of code-mixed tweets. We propose text collection, filtering, processing and labeling methods that enable us to create datasets for these low-resource languages. We evaluate a rangeof pre-trained models and transfer strategies on the dataset. We find that language-specific models and language-adaptivefine-tuning generally perform best. We release the datasets, trained models, sentiment lexicons, and code to incentivizeresearch on sentiment analysis in under-represented languages.

artificial intelligence, natural language, tweet, (17 more...)

arXiv.org Artificial Intelligence

2201.08277

Country:

Europe (1.00)
Africa (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback