AITopics | Muhammad, Shamsuddeen Hassan

Collaborating Authors

Muhammad, Shamsuddeen Hassan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring Cultural Nuances in Emotion Perception Across 15 African Languages

Ahmad, Ibrahim Said, Dudy, Shiran, Belay, Tadesse Destaw, Abdulmumin, Idris, Yimam, Seid Muhie, Muhammad, Shamsuddeen Hassan, Church, Kenneth

arXiv.org Artificial IntelligenceMar-25-2025

Understanding how emotions are expressed across languages is vital for building culturally-aware and inclusive NLP systems. However, emotion expression in African languages is understudied, limiting the development of effective emotion detection tools in these languages. In this work, we present a cross-linguistic analysis of emotion expression in 15 African languages. We examine four key dimensions of emotion representation: text length, sentiment polarity, emotion co-occurrence, and intensity variations. Our findings reveal diverse language-specific patterns in emotional expression -- with Somali texts typically longer, while others like IsiZulu and Algerian Arabic show more concise emotional expression. We observe a higher prevalence of negative sentiment in several Nigerian languages compared to lower negativity in languages like IsiXhosa. Further, emotion co-occurrence analysis demonstrates strong cross-linguistic associations between specific emotion pairs (anger-disgust, sadness-fear), suggesting universal psychological connections. Intensity distributions show multimodal patterns with significant variations between language families; Bantu languages display similar yet distinct profiles, while Afroasiatic languages and Nigerian Pidgin demonstrate wider intensity ranges. These findings highlight the need for language-specific approaches to emotion detection while identifying opportunities for transfer learning across related languages.

artificial intelligence, emotion, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.19642

Country:

North America > Mexico (0.28)
Europe > Austria (0.28)
Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.72)

Add feedback

HausaNLP at SemEval-2025 Task 2: Entity-Aware Fine-tuning vs. Prompt Engineering in Entity-Aware Machine Translation

Abubakar, Abdulhamid, Abdulkadir, Hamidatu, Abdullahi, Ibrahim Rabiu, Khalid, Abubakar Auwal, Wali, Ahmad Mustapha, Umar, Amina Aminu, Bala, Maryam, Sani, Sani Abdullahi, Ahmad, Ibrahim Said, Muhammad, Shamsuddeen Hassan, Abdulmumin, Idris, Marivate, Vukosi

arXiv.org Artificial IntelligenceMar-25-2025

This paper presents our findings for SemEval 2025 Task 2, a shared task on entity-aware machine translation (EA-MT). The goal of this task is to develop translation models that can accurately translate English sentences into target languages, with a particular focus on handling named entities, which often pose challenges for MT systems. The task covers 10 target languages with English as the source. In this paper, we describe the different systems we employed, detail our results, and discuss insights gained from our experiments.

large language model, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2503.19702

Country:

North America > United States (0.47)
Asia (0.46)

Genre: Research Report > New Finding (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Whispering in Amharic: Fine-tuning Whisper for Low-resource Language

Gete, Dawit Ketema, Ahamed, Bedru Yimam, Belay, Tadesse Destaw, Ejigu, Yohannes Ayana, Imam, Sukairaj Hafiz, Tessema, Alemu Belay, Adem, Mohammed Oumer, Belay, Tadesse Amare, Geislinger, Robert, Musa, Umma Aliyu, Semmann, Martin, Muhammad, Shamsuddeen Hassan, Schreiber, Henning, Yimam, Seid Muhie

arXiv.org Artificial IntelligenceMar-24-2025

This work explores fine-tuning OpenAI's Whisper automatic speech recognition (ASR) model for Amharic, a low-resource language, to improve transcription accuracy. While the foundational Whisper model struggles with Amharic due to limited representation in its training data, we fine-tune it using datasets like Mozilla Common Voice, FLEURS, and the BDU-speech dataset. The best-performing model, Whispersmall-am, significantly improves when finetuned on a mix of existing FLEURS data and new, unseen Amharic datasets. Training solely on new data leads to poor performance, but combining it with FLEURS data reinforces the model, enabling better specialization in Amharic. We also demonstrate that normalizing Amharic homophones significantly enhances Word Error Rate (WER) and Bilingual Evaluation Understudy (BLEU) scores. This study underscores the importance of fine-tuning strategies and dataset composition for improving ASR in low-resource languages, providing insights for future Amharic speech recognition research.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.18485

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

AfroXLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text

Belay, Tadesse Destaw, Azime, Israel Abebe, Ahmad, Ibrahim Said, Abdulmumin, Idris, Ayele, Abinew Ali, Muhammad, Shamsuddeen Hassan, Yimam, Seid Muhie

arXiv.org Artificial IntelligenceMar-23-2025

Pretrained Language Models (PLMs) built from various sources are the foundation of today's NLP progress. Language representations learned by such models achieve strong performance across many tasks with datasets of varying sizes drawn from various sources. We explore a thorough analysis of domain and task adaptive continual pretraining approaches for low-resource African languages and a promising result is shown for the evaluated tasks. We create AfriSocial, a corpus designed for domain adaptive finetuning that passes through quality pre-processing steps. Continual pretraining PLMs using AfriSocial as domain adaptive pretraining (DAPT) data, consistently improves performance on fine-grained emotion classification task of 16 targeted languages from 1% to 28.27% macro F1 score. Likewise, using the task adaptive pertaining (TAPT) approach, further finetuning with small unlabeled but similar task data shows promising results. For example, unlabeled sentiment data (source) for fine-grained emotion classification task (target) improves the base model results by an F1 score ranging from 0.55% to 15.11%. Combining the two methods, DAPT + TAPT, achieves also better results than base models. All the resources will be available to improve low-resource NLP tasks, generally, as well as other similar domain tasks such as hate speech and sentiment tasks.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.18247

Country:

Africa (1.00)
North America (0.94)
Europe > Austria > Vienna (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Who Wrote This? Identifying Machine vs Human-Generated Text in Hausa

Sani, Babangida, Soy, Aakansha, Imam, Sukairaj Hafiz, Mustapha, Ahmad, Aliyu, Lukman Jibril, Abdulmumin, Idris, Ahmad, Ibrahim Said, Muhammad, Shamsuddeen Hassan

arXiv.org Artificial IntelligenceMar-17-2025

The advancement of large language models (LLMs) has allowed them to be proficient in various tasks, including content generation. However, their unregulated usage can lead to malicious activities such as plagiarism and generating and spreading fake news, especially for low-resource languages. Most existing machine-generated text detectors are trained on high-resource languages like English, French, etc. In this study, we developed the first large-scale detector that can distinguish between human- and machine-generated content in Hausa. We scrapped seven Hausa-language media outlets for the human-generated text and the Gemini-2.0 flash model to automatically generate the corresponding Hausa-language articles based on the human-generated article headlines. We fine-tuned four pre-trained Afri-centric models (AfriTeVa, AfriBERTa, AfroXLMR, and AfroXLMR-76L) on the resulting dataset and assessed their performance using accuracy and F1-score metrics. AfroXLMR achieved the highest performance with an accuracy of 99.23% and an F1 score of 99.21%, demonstrating its effectiveness for Hausa text detection. Our dataset is made publicly available to enable further research.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.13101

Country: North America > United States (1.00)

Genre: Research Report (0.70)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection

Muhammad, Shamsuddeen Hassan, Ousidhoum, Nedjma, Abdulmumin, Idris, Yimam, Seid Muhie, Wahle, Jan Philip, Ruas, Terry, Beloucif, Meriem, De Kock, Christine, Belay, Tadesse Destaw, Ahmad, Ibrahim Said, Surange, Nirmal, Teodorescu, Daniela, Adelani, David Ifeoluwa, Aji, Alham Fikri, Ali, Felermino, Araujo, Vladimir, Ayele, Abinew Ali, Ignat, Oana, Panchenko, Alexander, Zhou, Yi, Mohammad, Saif M.

arXiv.org Artificial IntelligenceMar-10-2025

We present our shared task on text-based emotion detection, covering more than 30 languages from seven distinct language families. These languages are predominantly low-resource and spoken across various continents. The data instances are multi-labeled into six emotional classes, with additional datasets in 11 languages annotated for emotion intensity. Participants were asked to predict labels in three tracks: (a) emotion labels in monolingual settings, (b) emotion intensity scores, and (c) emotion labels in cross-lingual settings. The task attracted over 700 participants. We received final submissions from more than 200 teams and 93 system description papers. We report baseline results, as well as findings on the best-performing systems, the most common approaches, and the most effective methods across various tracks and languages. The datasets for this task are publicly available.

computational linguistic, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.07269

Country:

North America > United States (1.00)
Asia > India (1.00)
Africa (1.00)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Add feedback

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

Muhammad, Shamsuddeen Hassan, Ousidhoum, Nedjma, Abdulmumin, Idris, Wahle, Jan Philip, Ruas, Terry, Beloucif, Meriem, de Kock, Christine, Surange, Nirmal, Teodorescu, Daniela, Ahmad, Ibrahim Said, Adelani, David Ifeoluwa, Aji, Alham Fikri, Ali, Felermino D. M. A., Alimova, Ilseyar, Araujo, Vladimir, Babakov, Nikolay, Baes, Naomi, Bucur, Ana-Maria, Bukula, Andiswa, Cao, Guanqun, Cardenas, Rodrigo Tufino, Chevi, Rendi, Chukwuneke, Chiamaka Ijeoma, Ciobotaru, Alexandra, Dementieva, Daryna, Gadanya, Murja Sani, Geislinger, Robert, Gipp, Bela, Hourrane, Oumaima, Ignat, Oana, Lawan, Falalu Ibrahim, Mabuya, Rooweither, Mahendra, Rahmad, Marivate, Vukosi, Piper, Andrew, Panchenko, Alexander, Ferreira, Charles Henrique Porto, Protasov, Vitaly, Rutunda, Samuel, Shrivastava, Manish, Udrea, Aura Cristina, Wanzare, Lilian Diana Awuor, Wu, Sophie, Wunderlich, Florian Valentin, Zhafran, Hanif Muhammad, Zhang, Tianhui, Zhou, Yi, Mohammad, Saif M.

arXiv.org Artificial IntelligenceFeb-17-2025

People worldwide use language in subtle and complex ways to express emotions. While emotion recognition -- an umbrella term for several NLP tasks -- significantly impacts different applications in NLP and other fields, most work in the area is focused on high-resource languages. Therefore, this has led to major disparities in research and proposed solutions, especially for low-resource languages that suffer from the lack of high-quality datasets. In this paper, we present BRIGHTER-- a collection of multilabeled emotion-annotated datasets in 28 different languages. BRIGHTER covers predominantly low-resource languages from Africa, Asia, Eastern Europe, and Latin America, with instances from various domains annotated by fluent speakers. We describe the data collection and annotation processes and the challenges of building these datasets. Then, we report different experimental results for monolingual and crosslingual multi-label emotion identification, as well as intensity-level emotion recognition. We investigate results with and without using LLMs and analyse the large variability in performance across languages and text domains. We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition and discuss their impact and utility.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.11926

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.88)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

Yu, Hao, Alabi, Jesujoba O., Bukula, Andiswa, Zhuang, Jian Yun, Lee, En-Shiun Annie, Guge, Tadesse Kebede, Azime, Israel Abebe, Buzaaba, Happy, Sibanda, Blessing Kudzaishe, Kalipe, Godson K., Mukiibi, Jonathan, Kabenamualu, Salomon Kabongo, Setaka, Mmasibidi, Ndolela, Lolwethu, Odu, Nkiruka, Mabuya, Rooweither, Muhammad, Shamsuddeen Hassan, Osei, Salomey, Samb, Sokhar, Murage, Juliet W., Klakow, Dietrich, Adelani, David Ifeoluwa

arXiv.org Artificial IntelligenceFeb-13-2025

Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks often exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.09814

Country:

Europe (1.00)
Africa (1.00)
Asia > Middle East > UAE (0.46)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Consumer Products & Services (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in Hausa Language Using AfriBERTa

Sani, Sani Abdullahi, Muhammad, Shamsuddeen Hassan, Jarvis, Devon

arXiv.org Artificial IntelligenceJan-19-2025

Sentiment analysis (SA) plays a vital role in Natural Language Processing (NLP) by ~identifying sentiments expressed in text. Although significant advances have been made in SA for widely spoken languages, low-resource languages such as Hausa face unique challenges, primarily due to a lack of digital resources. This study investigates the effectiveness of Language-Adaptive Fine-Tuning (LAFT) to improve SA performance in Hausa. We first curate a diverse, unlabeled corpus to expand the model's linguistic capabilities, followed by applying LAFT to adapt AfriBERTa specifically to the nuances of the Hausa language. The adapted model is then fine-tuned on the labeled NaijaSenti sentiment dataset to evaluate its performance. Our findings demonstrate that LAFT gives modest improvements, which may be attributed to the use of formal Hausa text rather than informal social media data. Nevertheless, the pre-trained AfriBERTa model significantly outperformed models not specifically trained on Hausa, highlighting the importance of using pre-trained models in low-resource contexts. This research emphasizes the necessity for diverse data sources to advance NLP applications for low-resource African languages. We published the code and the dataset to encourage further research and facilitate reproducibility in low-resource NLP here: https://github.com/Sani-Abdullahi-Sani/Natural-Language-Processing/blob/main/Sentiment%20Analysis%20for%20Low%20Resource%20African%20Languages

afriberta, artificial intelligence, natural language, (14 more...)

arXiv.org Artificial Intelligence

2501.11023

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages

Muhammad, Shamsuddeen Hassan, Abdulmumin, Idris, Ayele, Abinew Ali, Adelani, David Ifeoluwa, Ahmad, Ibrahim Said, Aliyu, Saminu Mohammad, Onyango, Nelson Odhiambo, Wanzare, Lilian D. A., Rutunda, Samuel, Aliyu, Lukman Jibril, Alemneh, Esubalew, Hourrane, Oumaima, Gebremichael, Hagos Tesfahun, Ismail, Elyas Abdi, Beloucif, Meriem, Jibril, Ebrahim Chekol, Bukula, Andiswa, Mabuya, Rooweither, Osei, Salomey, Oppong, Abigail, Belay, Tadesse Destaw, Guge, Tadesse Kebede, Asfaw, Tesfa Tegegne, Chukwuneke, Chiamaka Ijeoma, Röttger, Paul, Yimam, Seid Muhie, Ousidhoum, Nedjma

arXiv.org Artificial IntelligenceJan-15-2025

Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and moderated. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked. These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is annotated by native speakers familiar with the local culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. The datasets, individual annotations, and hate speech and offensive language lexicons are available on https://github.com/AfriHate/AfriHate

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.08284

Country:

Asia (0.67)
Africa > Middle East (0.46)
North America > Canada (0.28)
(4 more...)

Genre: Research Report (0.81)

Industry:

Government (1.00)
Law > Civil Rights & Constitutional Law (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback