AITopics | Hourrane, Oumaima

Collaborating Authors

Hourrane, Oumaima

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

Muhammad, Shamsuddeen Hassan, Ousidhoum, Nedjma, Abdulmumin, Idris, Wahle, Jan Philip, Ruas, Terry, Beloucif, Meriem, de Kock, Christine, Surange, Nirmal, Teodorescu, Daniela, Ahmad, Ibrahim Said, Adelani, David Ifeoluwa, Aji, Alham Fikri, Ali, Felermino D. M. A., Alimova, Ilseyar, Araujo, Vladimir, Babakov, Nikolay, Baes, Naomi, Bucur, Ana-Maria, Bukula, Andiswa, Cao, Guanqun, Cardenas, Rodrigo Tufino, Chevi, Rendi, Chukwuneke, Chiamaka Ijeoma, Ciobotaru, Alexandra, Dementieva, Daryna, Gadanya, Murja Sani, Geislinger, Robert, Gipp, Bela, Hourrane, Oumaima, Ignat, Oana, Lawan, Falalu Ibrahim, Mabuya, Rooweither, Mahendra, Rahmad, Marivate, Vukosi, Piper, Andrew, Panchenko, Alexander, Ferreira, Charles Henrique Porto, Protasov, Vitaly, Rutunda, Samuel, Shrivastava, Manish, Udrea, Aura Cristina, Wanzare, Lilian Diana Awuor, Wu, Sophie, Wunderlich, Florian Valentin, Zhafran, Hanif Muhammad, Zhang, Tianhui, Zhou, Yi, Mohammad, Saif M.

arXiv.org Artificial IntelligenceFeb-17-2025

People worldwide use language in subtle and complex ways to express emotions. While emotion recognition -- an umbrella term for several NLP tasks -- significantly impacts different applications in NLP and other fields, most work in the area is focused on high-resource languages. Therefore, this has led to major disparities in research and proposed solutions, especially for low-resource languages that suffer from the lack of high-quality datasets. In this paper, we present BRIGHTER-- a collection of multilabeled emotion-annotated datasets in 28 different languages. BRIGHTER covers predominantly low-resource languages from Africa, Asia, Eastern Europe, and Latin America, with instances from various domains annotated by fluent speakers. We describe the data collection and annotation processes and the challenges of building these datasets. Then, we report different experimental results for monolingual and crosslingual multi-label emotion identification, as well as intensity-level emotion recognition. We investigate results with and without using LLMs and analyse the large variability in performance across languages and text domains. We show that BRIGHTER datasets are a step towards bridging the gap in text-based emotion recognition and discuss their impact and utility.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.11926

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.88)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages

Muhammad, Shamsuddeen Hassan, Abdulmumin, Idris, Ayele, Abinew Ali, Adelani, David Ifeoluwa, Ahmad, Ibrahim Said, Aliyu, Saminu Mohammad, Onyango, Nelson Odhiambo, Wanzare, Lilian D. A., Rutunda, Samuel, Aliyu, Lukman Jibril, Alemneh, Esubalew, Hourrane, Oumaima, Gebremichael, Hagos Tesfahun, Ismail, Elyas Abdi, Beloucif, Meriem, Jibril, Ebrahim Chekol, Bukula, Andiswa, Mabuya, Rooweither, Osei, Salomey, Oppong, Abigail, Belay, Tadesse Destaw, Guge, Tadesse Kebede, Asfaw, Tesfa Tegegne, Chukwuneke, Chiamaka Ijeoma, Röttger, Paul, Yimam, Seid Muhie, Ousidhoum, Nedjma

arXiv.org Artificial IntelligenceJan-15-2025

Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and moderated. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked. These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is annotated by native speakers familiar with the local culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. The datasets, individual annotations, and hate speech and offensive language lexicons are available on https://github.com/AfriHate/AfriHate

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.08284

Country:

Asia (0.67)
Africa > Middle East (0.46)
North America > Canada (0.28)
(4 more...)

Genre: Research Report (0.81)

Industry:

Government (1.00)
Law > Civil Rights & Constitutional Law (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages

Ousidhoum, Nedjma, Muhammad, Shamsuddeen Hassan, Abdalla, Mohamed, Abdulmumin, Idris, Ahmad, Ibrahim Said, Ahuja, Sanchit, Aji, Alham Fikri, Araujo, Vladimir, Beloucif, Meriem, De Kock, Christine, Hourrane, Oumaima, Shrivastava, Manish, Solorio, Thamar, Surange, Nirmal, Vishnubhotla, Krishnapriya, Yimam, Seid Muhie, Mohammad, Saif M.

arXiv.org Artificial IntelligenceApr-17-2024

We present the first shared task on Semantic Textual Relatedness (STR). While earlier shared tasks primarily focused on semantic similarity, we instead investigate the broader phenomenon of semantic relatedness across 14 languages: Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia -- regions characterised by the relatively limited availability of NLP resources. Each instance in the datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. Participating systems were asked to rank sentence pairs by their closeness in meaning (i.e., their degree of semantic relatedness) in the 14 languages in three main tracks: (a) supervised, (b) unsupervised, and (c) crosslingual. The task attracted 163 participants. We received 70 submissions in total (across all tasks) from 51 different teams, and 38 system description papers. We report on the best-performing systems as well as the most common and the most effective approaches for the three different tracks.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2403.18933

Country:

Europe (0.93)
North America > Mexico > Mexico City (0.19)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages

Ousidhoum, Nedjma, Muhammad, Shamsuddeen Hassan, Abdalla, Mohamed, Abdulmumin, Idris, Ahmad, Ibrahim Said, Ahuja, Sanchit, Aji, Alham Fikri, Araujo, Vladimir, Ayele, Abinew Ali, Baswani, Pavan, Beloucif, Meriem, Biemann, Chris, Bourhim, Sofia, De Kock, Christine, Dekebo, Genet Shanko, Hourrane, Oumaima, Kanumolu, Gopichand, Madasu, Lokesh, Rutunda, Samuel, Shrivastava, Manish, Solorio, Thamar, Surange, Nirmal, Tilaye, Hailegnaw Getaneh, Vishnubhotla, Krishnapriya, Winata, Genta, Yimam, Seid Muhie, Mohammad, Saif M.

arXiv.org Artificial IntelligenceFeb-15-2024

Exploring and quantifying semantic relatedness is central to representing language. It holds significant implications across various NLP tasks, including offering insights into the capabilities and performance of Large Language Models (LLMs). While earlier NLP research primarily focused on semantic similarity, often within the English language context, we instead investigate the broader phenomenon of semantic relatedness. In this paper, we present SemRel, a new semantic relatedness dataset collection annotated by native speakers across 14 languages:Afrikaans, Algerian Arabic, Amharic, English, Hausa, Hindi, Indonesian, Kinyarwanda, Marathi, Moroccan Arabic, Modern Standard Arabic, Punjabi, Spanish, and Telugu. These languages originate from five distinct language families and are predominantly spoken in Africa and Asia -- regions characterised by a relatively limited availability of NLP resources. Each instance in the SemRel datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences. The scores are obtained using a comparative annotation framework. We describe the data collection and annotation processes, related challenges when building the datasets, and their impact and utility in NLP. We further report experiments for each language and across the different languages.

artificial intelligence, natural language, text processing, (19 more...)

arXiv.org Artificial Intelligence

2402.08638

Country:

Europe (1.00)
Asia (1.00)
Africa (0.88)
(2 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.67)
Media (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages

Wang, Jiayi, Adelani, David Ifeoluwa, Agrawal, Sweta, Rei, Ricardo, Briakou, Eleftheria, Carpuat, Marine, Masiak, Marek, He, Xuanli, Bourhim, Sofia, Bukula, Andiswa, Mohamed, Muhidin, Olatoye, Temitayo, Mokayede, Hamam, Mwase, Christine, Kimotho, Wangui, Yuehgoh, Foutse, Aremu, Anuoluwapo, Ojo, Jessica, Muhammad, Shamsuddeen Hassan, Osei, Salomey, Omotayo, Abdul-Hakeem, Chukwuneke, Chiamaka, Ogayo, Perez, Hourrane, Oumaima, Anigri, Salma El, Ndolela, Lolwethu, Mangwana, Thabiso, Mohamed, Shafie Abdi, Hassan, Ayinde, Awoyomi, Oluwabusayo Olufunke, Alkhaled, Lama, Al-Azzawi, Sana, Etori, Naome A., Ochieng, Millicent, Siro, Clemencia, Njoroge, Samuel, Muchiri, Eric, Kimotho, Wangari, Momo, Lyse Naomi Wamba, Abolade, Daud, Ajao, Simbiat, Adewumi, Tosin, Shode, Iyanuoluwa, Macharm, Ricky, Iro, Ruqayya Nasir, Abdullahi, Saheed S., Moore, Stephen E., Opoku, Bernard, Akinjobi, Zainab, Afolabi, Abeeb, Obiefuna, Nnaemeka, Ogbu, Onyekachi Raphael, Brian, Sam, Otiende, Verrah Akinyi, Mbonu, Chinedu Emmanuel, Sari, Sakayo Toadoum, Stenetorp, Pontus

arXiv.org Artificial IntelligenceNov-16-2023

Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n-gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET, a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments (+0.406).

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.09828

Country:

North America > United States (1.00)
Africa (1.00)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

Muhammad, Shamsuddeen Hassan, Abdulmumin, Idris, Ayele, Abinew Ali, Ousidhoum, Nedjma, Adelani, David Ifeoluwa, Yimam, Seid Muhie, Ahmad, Ibrahim Sa'id, Beloucif, Meriem, Mohammad, Saif M., Ruder, Sebastian, Hourrane, Oumaima, Brazdil, Pavel, Ali, Felermino Dário Mário António, David, Davis, Osei, Salomey, Bello, Bello Shehu, Ibrahim, Falalu, Gwadabe, Tajuddeen, Rutunda, Samuel, Belay, Tadesse, Messelle, Wendimu Baye, Balcha, Hailu Beshada, Chala, Sisay Adugna, Gebremichael, Hagos Tesfahun, Opoku, Bernard, Arthur, Steven

arXiv.org Artificial IntelligenceNov-4-2023

Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents. These include 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yor\`ub\'a) from four language families. The tweets were annotated by native speakers and used in the AfriSenti-SemEval shared task (The AfriSenti Shared Task had over 200 participants. See website at https://afrisenti-semeval.github.io). We describe the data collection methodology, annotation process, and the challenges we dealt with when curating each dataset. We further report baseline experiments conducted on the different datasets and discuss their usefulness.

artificial intelligence, natural language, text processing, (19 more...)

arXiv.org Artificial Intelligence

2302.08956

Country:

North America > United States (1.00)
Africa (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.88)

Add feedback