AITopics | wolof

Collaborating Authors

wolof

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WolBanking77: Wolof Banking Speech Intent Classification Dataset

Kandji, Abdou Karim, Precioso, Frédéric, Ba, Cheikh, Ndiaye, Samba, Ndione, Augustin

arXiv.org Artificial IntelligenceOct-28-2025

Intent classification models have made a significant progress in recent years. However, previous studies primarily focus on high-resource language datasets, which results in a gap for low-resource languages and for regions with high rates of illiteracy, where languages are more spoken than read or written. This is the case in Senegal, for example, where Wolof is spoken by around 90\% of the population, while the national illiteracy rate remains at of 42\%. Wolof is actually spoken by more than 10 million people in West African region. To address these limitations, we introduce the Wolof Banking Speech Intent Classification Dataset (WolBanking77), for academic research in intent classification. WolBanking77 currently contains 9,791 text sentences in the banking domain and more than 4 hours of spoken sentences. Experiments on various baselines are conducted in this work, including text and voice state-of-the-art models. The results are very promising on this current dataset. In addition, this paper presents an in-depth examination of the dataset's contents. We report baseline F1-scores and word error rates metrics respectively on NLP and ASR models trained on WolBanking77 dataset and also comparisons between models. Dataset and code available at: https://github.com/abdoukarim/wolbanking77.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2509.19271

Country:

Africa > Senegal (1.00)
Asia (0.93)
Europe > France > Provence-Alpes-Côte d'Azur (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Banking & Finance (1.00)
Education (0.93)
Law (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Speech Language Models for Under-Represented Languages: Insights from Wolof

Sy, Yaya, Doucouré, Dioula, Cerisara, Christophe, Illina, Irina

arXiv.org Artificial IntelligenceSep-26-2025

We present our journey in training a speech language model for Wolof, an underrepresented language spoken in West Africa, and share key insights. We first emphasize the importance of collecting large-scale, spontaneous, high-quality unsupervised speech data, and show that continued pretraining HuBERT on this dataset outperforms both the base model and African-centric models on ASR. We then integrate this speech encoder into a Wolof LLM to train the first Speech LLM for this language, extending its capabilities to tasks such as speech translation. Furthermore, we explore training the Speech LLM to perform multi-step Chain-of-Thought before transcribing or translating. Our results show that the Speech LLM not only improves speech recognition but also performs well in speech translation. The models and the code will be openly shared.

artificial intelligence, natural language, speech recognition, (18 more...)

arXiv.org Artificial Intelligence

2509.15362

Country:

Africa > West Africa (0.24)
North America > United States > New Mexico (0.14)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Sentiment Analysis on the young people's perception about the mobile Internet costs in Senegal

Mbaye, Derguene, Seye, Madoune Robert, Diallo, Moussa, Ndiaye, Mamadou Lamine, Sow, Djiby, Adjanohoun, Dimitri Samuel, Mbengue, Tatiana, Wade, Cheikh Samba, Pablo, De Roulet, Munyaka, Jean-Claude Baraka, Chenal, Jerome

arXiv.org Artificial IntelligenceApr-21-2025

Internet penetration rates in Africa are rising steadily, and mobile Internet is getting an even bigger boost with the availability of smartphones. Young people are increasingly using the Internet, especially social networks, and Senegal is no exception to this revolution. Social networks have become the main means of expression for young people. Despite this evolution in Internet access, there are few operators on the market, which limits the alternatives available in terms of value for money. In this paper, we will look at how young people feel about the price of mobile Internet in Senegal, in relation to the perceived quality of the service, through their comments on social networks. We scanned a set of Twitter and Facebook comments related to the subject and applied a sentiment analysis model to gather their general feelings.

artificial intelligence, natural language, social media, (18 more...)

arXiv.org Artificial Intelligence

2504.13284

Country: Africa > Senegal (1.00)

Genre: Research Report (0.83)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)

Add feedback

Task-Oriented Dialog Systems for the Senegalese Wolof Language

Mbaye, Derguene, Diallo, Moussa

arXiv.org Artificial IntelligenceDec-15-2024

In recent years, we are seeing considerable interest in conversational agents with the rise of large language models (LLMs). Although they offer considerable advantages, LLMs also present significant risks, such as hallucination, which hinder their widespread deployment in industry. Moreover, low-resource languages such as African ones are still underrepresented in these systems limiting their performance in these languages. In this paper, we illustrate a more classical approach based on modular architectures of Task-oriented Dialog Systems (ToDS) offering better control over outputs. We propose a chatbot generation engine based on the Rasa framework and a robust methodology for projecting annotations onto the Wolof language using an in-house machine translation system. After evaluating a generated chatbot trained on the Amazon Massive dataset, our Wolof Intent Classifier performs similarly to the one obtained for French, which is a resource-rich language. We also show that this approach is extensible to other low-resource languages, thanks to the intent classifier's language-agnostic pipeline, simplifying the design of chatbots in these languages.

large language model, natural language, translation, (19 more...)

arXiv.org Artificial Intelligence

2412.11203

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
(14 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal

Gauthier, Elodie, Ndiaye, Aminata, Guissé, Abdoulaye

arXiv.org Artificial IntelligenceApr-2-2024

This work is part of the Kallaama project, whose objective is to produce and disseminate national languages corpora for speech technologies developments, in the field of agriculture. Except for Wolof, which benefits from some language data for natural language processing, national languages of Senegal are largely ignored by language technology providers. However, such technologies are keys to the protection, promotion and teaching of these languages. Kallaama focuses on the 3 main spoken languages by Senegalese people: Wolof, Pulaar and Sereer. These languages are widely spoken by the population, with around 10 million of native Senegalese speakers, not to mention those outside the country. However, they remain under-resourced in terms of machine-readable data that can be used for automatic processing and language technologies, all the more so in the agricultural sector. We release a transcribed speech dataset containing 125 hours of recordings, about agriculture, in each of the above-mentioned languages. These resources are specifically designed for Automatic Speech Recognition purpose, including traditional approaches. To build such technologies, we provide textual corpora in Wolof and Pulaar, and a pronunciation lexicon containing 49,132 entries from the Wolof dataset.

dataset, senegal, wolof, (16 more...)

arXiv.org Artificial Intelligence

2404.01991

Country:

Africa > Senegal > Dakar Region > Dakar (0.05)
Africa > Niger (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(15 more...)

Genre: Research Report (0.64)

Industry:

Banking & Finance (0.93)
Food & Agriculture > Agriculture (0.92)
Media (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Preuve de concept d'un bot vocal dialoguant en wolof

Gauthier, Elodie, Wade, Papa-Séga, Moudenc, Thierry, Collen, Patrice, De Neef, Emilie, Ba, Oumar, Cama, Ndeye Khoyane, Kebe, Cheikh Ahmadou Bamba, Gningue, Ndeye Aissatou, Aristide, Thomas Mendo'o

arXiv.org Artificial IntelligenceApr-2-2024

This paper presents the proof-of-concept of the first automatic voice assistant ever built in Wolof language, the main vehicular language spoken in Senegal. This voicebot is the result of a collaborative research project between Orange Innovation in France, Orange Senegal (aka Sonatel) and ADNCorp, a small IT company based in Dakar, Senegal. The purpose of the voicebot is to provide information to Orange customers about the Sargal loyalty program of Orange Senegal by using the most natural mean to communicate: speech. The voicebot receives in input the customer's oral request that is then processed by a SLU system to reply to the customer's request using audio recordings. The first results of this proof-of-concept are encouraging as we achieved 22\% of WER for the ASR task and 78\% of F1-score on the NLU task.

bot vocal, utilisateur, wolof, (13 more...)

arXiv.org Artificial Intelligence

2404.02009

Country:

Africa > Senegal > Dakar Region > Dakar (0.26)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > Austria > Styria > Graz (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.49)

Add feedback

MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African Languages

Dione, Cheikh M. Bamba, Adelani, David, Nabende, Peter, Alabi, Jesujoba, Sindane, Thapelo, Buzaaba, Happy, Muhammad, Shamsuddeen Hassan, Emezue, Chris Chinenye, Ogayo, Perez, Aremu, Anuoluwapo, Gitau, Catherine, Mbaye, Derguene, Mukiibi, Jonathan, Sibanda, Blessing, Dossou, Bonaventure F. P., Bukula, Andiswa, Mabuya, Rooweither, Tapo, Allahsera Auguste, Munkoh-Buabeng, Edwin, Koagne, victoire Memdjokam, Kabore, Fatoumata Ouoba, Taylor, Amelia, Kalipe, Godson, Macucwa, Tebogo, Marivate, Vukosi, Gwadabe, Tajuddeen, Elvis, Mboning Tchiaze, Onyenwe, Ikechukwu, Atindogbe, Gratien, Adelani, Tolulope, Akinade, Idris, Samuel, Olanrewaju, Nahimana, Marien, Musabeyezu, Théogène, Niyomutabazi, Emile, Chimhenga, Ester, Gotosa, Kudzai, Mizha, Patrick, Agbolo, Apelete, Traore, Seydou, Uchechukwu, Chinedu, Yusuf, Aliyu, Abdullahi, Muhammad, Klakow, Dietrich

arXiv.org Artificial IntelligenceMay-23-2023

In this paper, we present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines. We conducted extensive POS baseline experiments using conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in UD. Evaluating on the MasakhaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with cross-lingual parameter-efficient fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems more effective for POS tagging in unseen languages.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.13989

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Greater London > London (0.14)
Africa > Niger (0.05)
(31 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Beqi: Revitalize the Senegalese Wolof Language with a Robust Spelling Corrector

Mbaye, Derguene, Diallo, Moussa

arXiv.org Artificial IntelligenceMay-15-2023

The progress of Natural Language Processing (NLP), although fast in recent years, is not at the same pace for all languages. African languages in particular are still behind and lack automatic processing tools. Some of these tools are very important for the development of these languages but also have an important role in many NLP applications. This is particularly the case for automatic spell checkers. Several approaches have been studied to address this task and the one modeling spelling correction as a translation task from misspelled (noisy) text to well-spelled (correct) text shows promising results. However, this approach requires a parallel corpus of noisy data on the one hand and correct data on the other hand, whereas Wolof is a low-resource language and does not have such a corpus. In this paper, we present a way to address the constraint related to the lack of data by generating synthetic data and we present sequence-to-sequence models using Deep Learning for spelling correction in Wolof. We evaluated these models in three different scenarios depending on the subwording method applied to the data and showed that the latter had a significant impact on the performance of the models, which opens the way for future research in Wolof spelling correction.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.08518

Country:

Africa > Senegal > Dakar Region > Dakar (0.05)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(14 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Low-Resourced Machine Translation for Senegalese Wolof Language

Mbaye, Derguene, Diallo, Moussa, Diop, Thierno Ibrahima

arXiv.org Artificial IntelligenceApr-30-2023

Natural Language Processing (NLP) research has made great advancements in recent years with major breakthroughs that have established new benchmarks. However, these advances have mainly benefited a certain group of languages commonly referred to as resource-rich such as English and French. Majority of other languages with weaker resources are then left behind which is the case for most African languages including Wolof. In this work, we present a parallel Wolof/French corpus of 123,000 sentences on which we conducted experiments on machine translation models based on Recurrent Neural Networks (RNN) in different data configurations. We noted performance gains with the models trained on subworded data as well as those trained on the French-English language pair compared to those trained on the French-Wolof pair under the same experimental conditions.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.00606

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
Africa > Senegal > Dakar Region > Dakar (0.05)
Europe > France > Île-de-France > Paris > Paris (0.04)
(15 more...)

Genre: Research Report (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Investigating data partitioning strategies for crosslinguistic low-resource ASR evaluation

Liu, Zoey, Spence, Justin, Prud'hommeaux, Emily

arXiv.org Artificial IntelligenceAug-26-2022

Many automatic speech recognition (ASR) data sets include a single pre-defined test set consisting of one or more speakers whose speech never appears in the training set. This "hold-speaker(s)-out" data partitioning strategy, however, may not be ideal for data sets in which the number of speakers is very small. This study investigates ten different data split methods for five languages with minimal ASR training resources. We find that (1) model performance varies greatly depending on which speaker is selected for testing; (2) the average word error rate (WER) across all held-out speakers is comparable not only to the average WER over multiple random splits but also to any given individual random split; (3) WER is also generally comparable when the data is split heuristically or adversarially; (4) utterance duration and intensity are comparatively more predictive factors of variability regardless of the data split. These results suggest that the widely used hold-speakers-out approach to ASR data partitioning can yield results that do not reflect model performance on unseen data or speakers. Random splits can yield more reliable and generalizable estimates when facing data sparsity.

computational linguistic, random split, utterance, (16 more...)

arXiv.org Artificial Intelligence

2208.12888

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
North America > United States > California > Yolo County > Davis (0.04)
North America > United States > Arizona (0.04)
(3 more...)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback