AITopics | tunisian arabic

Collaborating Authors

tunisian arabic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How Well Do LLMs Understand Tunisian Arabic?

Mahdi, Mohamed

arXiv.org Artificial IntelligenceNov-24-2025

Large Language Models (LLMs) are the engines driving today's AI agents. The better these models understand human languages, the more natural and user-friendly the interaction with AI becomes, from everyday devices like computers and smartwatches to any tool that can act intelligently. Yet, the ability of industrial-scale LLMs to comprehend low-resource languages, such as Tunisian Arabic (Tunizi), is often overlooked. This neglect risks excluding millions of Tunisians from fully interacting with AI in their own language, pushing them toward French or English. Such a shift not only threatens the preservation of the Tunisian dialect but may also create challenges for literacy and influence younger generations to favor foreign languages. In this study, we introduce a novel dataset containing parallel Tunizi, standard Tunisian Arabic, and English translations, along with sentiment labels. We benchmark several popular LLMs on three tasks: transliteration, translation, and sentiment analysis. Our results reveal significant differences between models, highlighting both their strengths and limitations in understanding and processing Tunisian dialects. By quantifying these gaps, this work underscores the importance of including low-resource languages in the next generation of AI systems, ensuring technology remains accessible, inclusive, and culturally grounded.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.16683

Country: Africa > Middle East > Tunisia (0.15)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

TEDxTN: A Three-way Speech Translation Corpus for Code-Switched Tunisian Arabic - English

Bougares, Fethi, Mdhaffar, Salima, Elleuch, Haroun, Estève, Yannick

arXiv.org Artificial IntelligenceNov-17-2025

In this paper, we introduce TEDxTN, the first publicly available Tunisian Arabic to English speech translation dataset. This work is in line with the ongoing effort to mitigate the data scarcity obstacle for a number of Arabic dialects. We collected, segmented, transcribed and translated 108 TEDx talks following our internally developed annotations guidelines. The collected talks represent 25 hours of speech with code-switching that cover speakers with various accents from over 11 different regions of Tunisia. We make the annotation guidelines and corpus publicly available. This will enable the extension of TEDxTN to new talks as they become available. We also report results for strong baseline systems of Speech Recognition and Speech Translation using multiple pre-trained and fine-tuned end-to-end models. This corpus is the first open source and publicly available speech translation corpus of Code-Switching Tunisian dialect. We believe that this is a valuable resource that can motivate and facilitate further research on the natural language processing of Tunisian Dialect.

artificial intelligence, corpus, natural language, (15 more...)

arXiv.org Artificial Intelligence

2511.1078

Country:

Europe (1.00)
Africa > Middle East > Tunisia (0.25)

Genre: Research Report (0.64)

Industry: Education > Educational Setting (0.94)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Mufu: Multilingual Fused Learning for Low-Resource Translation with LLM

Lim, Zheng Wei, Gupta, Nitish, Yu, Honglin, Cohn, Trevor

arXiv.org Artificial IntelligenceSep-20-2024

Multilingual large language models (LLMs) are great translators, but this is largely limited to high-resource languages. For many LLMs, translating in and out of lowresource languages remains a challenging task. To maximize data e ciency in this low-resource setting, we introduce Mufu, which includes a selection of automatically generated multilingual candidates and an instruction to correct inaccurate translations in the prompt. Mufu prompts turn a translation task into a postediting one, and seek to harness the LLM's reasoning capability with auxiliary translation candidates, from which the model is required to assess the input quality, align the semantics cross-lingually, copy from relevant inputs and override instances that are incorrect. Our experiments on En-XX translations over the Flores-200 dataset show LLMs finetuned against Mufu-style prompts are robust to poor quality auxiliary translation candidates, achieving performance superior to NLLB 1.3B distilled model in 64% of low-and very-low-resource language pairs. We then distill these models to reduce inference cost, while maintaining on average 3.1 chrF improvement over finetune-only baseline in low-resource translations. This performance gap is caused primarily by scant pre-training data in these languages (Wei et al., 2023; Yuan et al., 2024; Alves et al., 2024), and is di cult to overcome despite growing e orts to support translations of long-tail languages (Kudugunta et al., 2024; Bapna et al., 2022; Lu et al., 2024). In this work, we introduce multilingual fused learning (Mufu), which combines multilingual context and a postediting task when translating into lower-resource languages using LLMs.1 Mufu-style prompts (see Table 1, top block) include several multilingual translation candidates along with a postediting target, from which a model learns "in-context" to translate from languages with which the target language is more closely aligned due to cultural relevance, geographical and genealogical proximity. We rely on a larger, more competent multilingual teacher model to generate auxiliary translations in these languages, which help disambiguate inputs and improve cross-lingual semantic alignment in a translation task.

arabic, egyptian arabic, translation, (15 more...)

arXiv.org Artificial Intelligence

2409.13949

Country:

Africa > Kenya (0.06)
North America > United States (0.06)
Asia > Myanmar (0.06)
(5 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.46)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Normalized Orthography for Tunisian Arabic

Turki, Houcemeddine, Ellouze, Kawthar, Ammar, Hager Ben, Taieb, Mohamed Ali Hadj, Adel, Imed, Aouicha, Mohamed Ben, Farri, Pier Luigi, Bennour, Abderrezak

arXiv.org Artificial IntelligenceJun-11-2024

Tunisian Arabic (ISO 693-3: aeb) isa distinct variety native to Tunisia, derived from Arabic and enriched by various historical influences. This research introduces the "Normalized Orthography for Tunisian Arabic" (NOTA), an adaptation of CODA* guidelines for transcribing Tunisian Arabic using Arabic script. The aim is to enhance language resource development by ensuring user-friendliness and consistency. The updated standard addresses challenges in accurately representing Tunisian phonology and morphology, correcting issues from transcriptions based on Modern Standard Arabic.

arabic, guideline, tunisian arabic, (12 more...)

arXiv.org Artificial Intelligence

2402.1294

Country:

Europe > Austria > Vienna (0.14)
Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.05)
Africa > Middle East > Tunisia > Sousse Governorate > Sousse (0.04)
(16 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Multilingual African Embedding for FAQ Chatbots

Mabrouk, Aymen Ben Elhaj, Hmida, Moez Ben Haj, Fourati, Chayma, Haddad, Hatem, Messaoudi, Abir

arXiv.org Artificial IntelligenceMar-16-2021

Searching for an available, reliable, official, and understandable information is not a trivial task due to scattered information across the internet, and the availability lack of governmental communication channels communicating with African dialects and languages. In this paper, we introduce an Artificial Intelligence Powered chatbot for crisis communication that would be omnichannel, multilingual and multi dialectal. We present our work on modified StarSpace embedding tailored for African dialects for the question-answering task along with the architecture of the proposed chatbot system and a description of the different layers. English, French, Arabic, Tunisian, Igbo,Yor\`ub\'a, and Hausa are used as languages and dialects. Quantitative and qualitative evaluation results are obtained for our real deployed Covid-19 chatbot. Results show that users are satisfied and the conversation with the chatbot is meeting customer needs.

arabic, chatbot, dialect, (15 more...)

arXiv.org Artificial Intelligence

2103.09185

Country:

Africa > Nigeria (0.06)
Africa > Middle East > Tunisia (0.05)
North America > United States > Indiana (0.04)
(3 more...)

Genre: Research Report (0.70)

Industry: Health & Medicine (0.91)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback