AITopics | Ahmad, Ibrahim Said

Collaborating Authors

Ahmad, Ibrahim Said

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mitigating Translationese in Low-resource Languages: The Storyboard Approach

Kuwanto, Garry, Urua, Eno-Abasi E., Amuok, Priscilla Amondi, Muhammad, Shamsuddeen Hassan, Aremu, Anuoluwapo, Otiende, Verrah, Nanyanga, Loice Emma, Nyoike, Teresiah W., Akpan, Aniefon D., Udouboh, Nsima Ab, Archibong, Idongesit Udeme, Moses, Idara Effiong, Ige, Ifeoluwatayo A., Ajibade, Benjamin, Awokoya, Olumide Benjamin, Abdulmumin, Idris, Aliyu, Saminu Mohammad, Iro, Ruqayya Nasir, Ahmad, Ibrahim Said, Smith, Deontae, Michaels, Praise-EL, Adelani, David Ifeoluwa, Wijaya, Derry Tanti, Andy, Anietie

arXiv.org Artificial IntelligenceJul-14-2024

Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency. Human annotators and quantitative metrics were used to assess translation quality. The results indicate a preference for text translation in terms of accuracy, while our method demonstrates worse accuracy but better fluency in the language focused.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2407.10152

Country:

Europe (1.00)
North America > United States (0.95)
Asia > Middle East > UAE (0.14)

Genre: Research Report > Experimental Study (0.47)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

Add feedback

Nollywood: Let's Go to the Movies!

Ortega, John E., Ahmad, Ibrahim Said, Chen, William

arXiv.org Artificial IntelligenceJul-2-2024

Nollywood, based on the idea of Bollywood from India, is a series of outstanding movies that originate from Nigeria. Unfortunately, while the movies are in English, they are hard to understand for many native speakers due to the dialect of English that is spoken. In this article, we accomplish two goals: (1) create a phonetic sub-title model that is able to translate Nigerian English speech to American English and (2) use the most advanced toxicity detectors to discover how toxic the speech is. Our aim is to highlight the text in these videos which is often times ignored for lack of dialectal understanding due the fact that many people in Nigeria speak a native language like Hausa at home.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.02631

Country:

North America > United States (0.92)
Africa (0.58)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)

Add feedback

Are Generative Language Models Multicultural? A Study on Hausa Culture and Emotions using ChatGPT

Ahmad, Ibrahim Said, Dudy, Shiran, Ramachandranpillai, Resmi, Church, Kenneth

arXiv.org Artificial IntelligenceJun-27-2024

Large Language Models (LLMs), such as ChatGPT, are widely used to generate content for various purposes and audiences. However, these models may not reflect the cultural and emotional diversity of their users, especially for low-resource languages. In this paper, we investigate how ChatGPT represents Hausa's culture and emotions. We compare responses generated by ChatGPT with those provided by native Hausa speakers on 37 culturally relevant questions. We conducted experiments using emotion analysis and applied two similarity metrics to measure the alignment between human and ChatGPT responses. We also collected human participants ratings and feedback on ChatGPT responses. Our results show that ChatGPT has some level of similarity to human responses, but also exhibits some gaps and biases in its knowledge and awareness of the Hausa culture and emotions. We discuss the implications and limitations of our methodology and analysis and suggest ways to improve the performance and evaluation of LLMs for low-resource languages.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.19504

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages

Ousidhoum, Nedjma, Muhammad, Shamsuddeen Hassan, Abdalla, Mohamed, Abdulmumin, Idris, Ahmad, Ibrahim Said, Ahuja, Sanchit, Aji, Alham Fikri, Araujo, Vladimir, Beloucif, Meriem, De Kock, Christine, Hourrane, Oumaima, Shrivastava, Manish, Solorio, Thamar, Surange, Nirmal, Vishnubhotla, Krishnapriya, Yimam, Seid Muhie, Mohammad, Saif M.

arXiv.org Artificial IntelligenceApr-17-2024

We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia -- regions characterised by the relatively limited availability of NLP resources. Each instance in the datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. Participating systems were asked to rank sentence pairs by their closeness in meaning (i.e., their degree of semantic relatedness) in the 14 languages in three main tracks: (a) supervised, (b) unsupervised, and (c) crosslingual. The task attracted 163 participants. We received 70 submissions in total (across all tasks) from 51 different teams, and 38 system description papers. We report on the best-performing systems as well as the most common and the most effective approaches for the three different tracks.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2403.18933

Country:

Europe (0.93)
North America > Mexico > Mexico City (0.19)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages

Ousidhoum, Nedjma, Muhammad, Shamsuddeen Hassan, Abdalla, Mohamed, Abdulmumin, Idris, Ahmad, Ibrahim Said, Ahuja, Sanchit, Aji, Alham Fikri, Araujo, Vladimir, Ayele, Abinew Ali, Baswani, Pavan, Beloucif, Meriem, Biemann, Chris, Bourhim, Sofia, De Kock, Christine, Dekebo, Genet Shanko, Hourrane, Oumaima, Kanumolu, Gopichand, Madasu, Lokesh, Rutunda, Samuel, Shrivastava, Manish, Solorio, Thamar, Surange, Nirmal, Tilaye, Hailegnaw Getaneh, Vishnubhotla, Krishnapriya, Winata, Genta, Yimam, Seid Muhie, Mohammad, Saif M.

arXiv.org Artificial IntelligenceFeb-15-2024

Exploring and quantifying semantic relatedness is central to representing language. It holds significant implications across various NLP tasks, including offering insights into the capabilities and performance of Large Language Models (LLMs). While earlier NLP research primarily focused on semantic similarity, often within the English language context, we instead investigate the broader phenomenon of semantic relatedness. In this paper, we present SemRel, a new semantic relatedness dataset collection annotated by native speakers across 14 languages:Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia -- regions characterised by a relatively limited availability of NLP resources. Each instance in the SemRel datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. The scores are obtained using a comparative annotation framework. We describe the data collection and annotation processes, related challenges when building the datasets, and their impact and utility in NLP. We further report experiments for each language and across the different languages.

artificial intelligence, natural language, text processing, (19 more...)

arXiv.org Artificial Intelligence

2402.08638

Country:

Europe (1.00)
Asia (1.00)
Africa (0.88)
(2 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.67)
Media (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Analyzing COVID-19 Vaccination Sentiments in Nigerian Cyberspace: Insights from a Manually Annotated Twitter Dataset

Ahmad, Ibrahim Said, Aliyu, Lukman Jibril, Khalid, Abubakar Auwal, Aliyu, Saminu Muhammad, Muhammad, Shamsuddeen Hassan, Abdulmumin, Idris, Abduljalil, Bala Mairiga, Bello, Bello Shehu, Abubakar, Amina Imam

arXiv.org Artificial IntelligenceJan-23-2024

Numerous successes have been achieved in combating the COVID-19 pandemic, initially using various precautionary measures like lockdowns, social distancing, and the use of face masks. More recently, various vaccinations have been developed to aid in the prevention or reduction of the severity of the COVID-19 infection. Despite the effectiveness of the precautionary measures and the vaccines, there are several controversies that are massively shared on social media platforms like Twitter. In this paper, we explore the use of state-of-the-art transformer-based language models to study people's acceptance of vaccines in Nigeria. We developed a novel dataset by crawling multi-lingual tweets using relevant hashtags and keywords. Our analysis and visualizations revealed that most tweets expressed neutral sentiments about COVID-19 vaccines, with some individuals expressing positive views, and there was no strong preference for specific vaccine types, although Moderna received slightly more positive sentiment. We also found out that fine-tuning a pre-trained LLM with an appropriate dataset can yield competitive results, even if the LLM was not initially pre-trained on the specific language of that dataset.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.13133

Country: Africa > Nigeria (0.93)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Leveraging Closed-Access Multilingual Embedding for Automatic Sentence Alignment in Low Resource Languages

Abdulmumin, Idris, Khalid, Auwal Abubakar, Muhammad, Shamsuddeen Hassan, Ahmad, Ibrahim Said, Aliyu, Lukman Jibril, Sani, Babangida, Abduljalil, Bala Mairiga, Hassan, Sani Ahmad

arXiv.org Artificial IntelligenceNov-20-2023

The importance of qualitative parallel data in machine translation has long been determined but it has always been very difficult to obtain such in sufficient quantity for the majority of world languages, mainly because of the associated cost and also the lack of accessibility to these languages. Despite the potential for obtaining parallel datasets from online articles using automatic approaches, forensic investigations have found a lot of quality-related issues such as misalignment, and wrong language codes. In this work, we present a simple but qualitative parallel sentence aligner that carefully leveraged the closed-access Cohere multilingual embedding, a solution that ranked second in the just concluded #CoHereAIHack 2023 Challenge (see https://ai6lagos.devpost.com). The proposed approach achieved $94.96$ and $54.83$ f1 scores on FLORES and MAFAND-MT, compared to $3.64$ and $0.64$ of LASER respectively. Our method also achieved an improvement of more than 5 BLEU scores over LASER, when the resulting datasets were used with MAFAND-MT dataset to train translation models. Our code and data are available for research purposes here (https://github.com/abumafrim/Cohere-Align).

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.12179

Country:

Europe (0.68)
Asia > Middle East > UAE (0.14)
North America > United States > Texas (0.14)
North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

Parida, Shantipriya, Abdulmumin, Idris, Muhammad, Shamsuddeen Hassan, Bose, Aneesh, Kohli, Guneet Singh, Ahmad, Ibrahim Said, Kotwal, Ketan, Sarkar, Sayan Deb, Bojar, Ondřej, Kakudi, Habeebah Adamu

arXiv.org Artificial IntelligenceMay-28-2023

This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.

artificial intelligence, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2305.1769

Country:

North America > United States > New Mexico (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

AfriQA: Cross-lingual Open-Retrieval Question Answering for African Languages

Ogundepo, Odunayo, Gwadabe, Tajuddeen R., Rivera, Clara E., Clark, Jonathan H., Ruder, Sebastian, Adelani, David Ifeoluwa, Dossou, Bonaventure F. P., DIOP, Abdou Aziz, Sikasote, Claytone, Hacheme, Gilles, Buzaaba, Happy, Ezeani, Ignatius, Mabuya, Rooweither, Osei, Salomey, Emezue, Chris, Kahira, Albert Njoroge, Muhammad, Shamsuddeen H., Oladipo, Akintunde, Owodunni, Abraham Toluwase, Tonja, Atnafu Lambebo, Shode, Iyanuoluwa, Asai, Akari, Ajayi, Tunde Oluwaseyi, Siro, Clemencia, Arthur, Steven, Adeyemi, Mofetoluwa, Ahia, Orevaoghene, Aremu, Anuoluwapo, Awosan, Oyinkansola, Chukwuneke, Chiamaka, Opoku, Bernard, Ayodele, Awokoya, Otiende, Verrah, Mwase, Christine, Sinkala, Boyd, Rubungo, Andre Niyongabo, Ajisafe, Daniel A., Onwuegbuzia, Emeka Felix, Mbow, Habib, Niyomutabazi, Emile, Mukonde, Eunice, Lawan, Falalu Ibrahim, Ahmad, Ibrahim Said, Alabi, Jesujoba O., Namukombo, Martin, Chinedu, Mbonu, Phiri, Mofya, Putini, Neo, Mngoma, Ndumiso, Amuok, Priscilla A., Iro, Ruqayya Nasir, Adhiambo, Sonia

arXiv.org Artificial IntelligenceMay-11-2023

African languages have far less in-language content available digitally, making it challenging for question answering systems to satisfy the information needs of users. Cross-lingual open-retrieval question answering (XOR QA) systems -- those that retrieve answer content from other languages while serving people in their native language -- offer a means of filling this gap. To this end, we create AfriQA, the first cross-lingual QA dataset with a focus on African languages. AfriQA includes 12,000+ XOR QA examples across 10 African languages. While previous datasets have focused primarily on languages where cross-lingual QA augments coverage from the target language, AfriQA focuses on languages where cross-lingual answer content is the only high-coverage source of answer content. Because of this, we argue that African languages are one of the most important and realistic use cases for XOR QA. Our experiments demonstrate the poor performance of automatic translation and multilingual retrieval methods. Overall, AfriQA proves challenging for state-of-the-art QA models. We hope that the dataset enables the development of more equitable QA technology.

artificial intelligence, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2305.06897

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)
North America > Canada > Quebec (0.14)
Africa > Nigeria > Kaduna State (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

HausaNLP at SemEval-2023 Task 10: Transfer Learning, Synthetic Data and Side-Information for Multi-Level Sexism Classification

Aliyu, Saminu Mohammad, Abdulmumin, Idris, Muhammad, Shamsuddeen Hassan, Ahmad, Ibrahim Said, Salahudeen, Saheed Abdullahi, Yusuf, Aliyu, Lawan, Falalu Ibrahim

arXiv.org Artificial IntelligenceApr-28-2023

We present the findings of our participation in the SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) task, a shared task on offensive language (sexism) detection on English Gab and Reddit dataset. We investigated the effects of transferring two language models: XLM-T (sentiment classification) and HateBERT (same domain -- Reddit) for multi-level classification into Sexist or not Sexist, and other subsequent sub-classifications of the sexist data. We also use synthetic classification of unlabelled dataset and intermediary class information to maximize the performance of our models. We submitted a system in Task A, and it ranked 49th with F1-score of 0.82. This result showed to be competitive as it only under-performed the best system by 0.052% F1-score.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.00076

Country:

North America (0.47)
Africa > Nigeria > Kaduna State (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Media > News (0.56)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.48)

Add feedback