AITopics | Kheir, Yassine El

Collaborating Authors

Kheir, Yassine El

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies

Asgari, Ehsaneddin, Kheir, Yassine El, Javaheri, Mohammad Ali Sadraei

arXiv.org Artificial IntelligenceFeb-2-2025

Tokenization is fundamental to Natural Language Processing (NLP), directly impacting model efficiency and linguistic fidelity. While Byte Pair Encoding (BPE) is widely used in Large Language Models (LLMs), it often disregards morpheme boundaries, leading to suboptimal segmentation, particularly in morphologically rich languages. We introduce MorphBPE, a morphology-aware extension of BPE that integrates linguistic structure into subword tokenization while preserving statistical efficiency. Additionally, we propose two morphology-based evaluation metrics: (i) Morphological Consistency F1-Score, which quantifies the consistency between morpheme sharing and token sharing, contributing to LLM training convergence, and (ii) Morphological Edit Distance, which measures alignment between morphemes and tokens concerning interpretability. Experiments on English, Russian, Hungarian, and Arabic across 300M and 1B parameter LLMs demonstrate that MorphBPE consistently reduces cross-entropy loss, accelerates convergence, and improves morphological alignment scores. Fully compatible with existing LLM pipelines, MorphBPE requires minimal modifications for integration. The MorphBPE codebase and tokenizer playground will be available at: https://github.com/llm-lab-org/MorphBPE and https://tokenizer.llm-lab.org

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.00894

Country:

Europe (1.00)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Fanar: An Arabic-Centric Multimodal Generative AI Platform

Fanar Team, null, Abbas, Ummar, Ahmad, Mohammad Shahmeer, Alam, Firoj, Altinisik, Enes, Asgari, Ehsannedin, Boshmaf, Yazan, Boughorbel, Sabri, Chawla, Sanjay, Chowdhury, Shammur, Dalvi, Fahim, Darwish, Kareem, Durrani, Nadir, Elfeky, Mohamed, Elmagarmid, Ahmed, Eltabakh, Mohamed, Fatehkia, Masoomali, Fragkopoulos, Anastasios, Hasanain, Maram, Hawasly, Majd, Husaini, Mus'ab, Jung, Soon-Gyo, Lucas, Ji Kim, Magdy, Walid, Messaoud, Safa, Mohamed, Abubakr, Mohiuddin, Tasnim, Mousi, Basel, Mubarak, Hamdy, Musleh, Ahmad, Naeem, Zan, Ouzzani, Mourad, Popovic, Dorde, Sadeghi, Amin, Sencar, Husrev Taha, Shinoy, Mohammed, Sinan, Omar, Zhang, Yifan, Ali, Ahmed, Kheir, Yassine El, Ma, Xiaosong, Ruan, Chaoyi

arXiv.org Artificial IntelligenceJan-18-2025

We present Fanar, a platform for Arabic-centric multimodal generative AI systems, that supports language, speech and image generation tasks. At the heart of Fanar are Fanar Star and Fanar Prime, two highly capable Arabic Large Language Models (LLMs) that are best in the class on well established benchmarks for similar sized models. Fanar Star is a 7B (billion) parameter model that was trained from scratch on nearly 1 trillion clean and deduplicated Arabic, English and Code tokens. Fanar Prime is a 9B parameter model continually trained on the Gemma-2 9B base model on the same 1 trillion token set. Both models are concurrently deployed and designed to address different types of prompts transparently routed through a custom-built orchestrator. The Fanar platform provides many other capabilities including a customized Islamic Retrieval Augmented Generation (RAG) system for handling religious prompts, a Recency RAG for summarizing information about current or recent events that have occurred after the pre-training data cut-off date. The platform provides additional cognitive capabilities including in-house bilingual speech recognition that supports multiple Arabic dialects, voice and image generation that is fine-tuned to better reflect regional characteristics. Finally, Fanar provides an attribution service that can be used to verify the authenticity of fact based generated content. The design, development, and implementation of Fanar was entirely undertaken at Hamad Bin Khalifa University's Qatar Computing Research Institute (QCRI) and was sponsored by Qatar's Ministry of Communications and Information Technology to enable sovereign AI technology development.

arabic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.13944

Country:

Europe (1.00)
Africa > Middle East (1.00)
Asia > Middle East > Qatar (0.88)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Workflow (1.00)
Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Education (1.00)
Information Technology > Services (0.45)
Law > Intellectual Property & Technology Law (0.45)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

CAFE A Novel Code switching Dataset for Algerian Dialect French and English

Lachemat, Houssam Eddine-Othman, Abbas, Akli, Oukas, Nourredine, Kheir, Yassine El, Haboussi, Samia, Shammur, Absar Chowdhury

arXiv.org Artificial IntelligenceNov-20-2024

The paper introduces and publicly releases (Data download link available after acceptance) CAFE -- the first Code-switching dataset between Algerian dialect, French, and english languages. The CAFE speech data is unique for (a) its spontaneous speaking style in vivo human-human conversation capturing phenomena like code-switching and overlapping speech, (b) addresses distinct linguistic challenges in North African Arabic dialect; (c) the CAFE captures dialectal variations from various parts of Algeria within different sociolinguistic contexts. CAFE data contains approximately 37 hours of speech, with a subset, CAFE-small, of 2 hours and 36 minutes released with manual human annotation including speech segmentation, transcription, explicit annotation of code-switching points, overlapping speech, and other events such as noises, and laughter among others. The rest approximately 34.58 hours contain pseudo label transcriptions. In addition to the data release, the paper also highlighted the challenges of using state-of-the-art Automatic Speech Recognition (ASR) models such as Whisper large-v2,3 and PromptingWhisper to handle such content. Following, we benchmark CAFE data with the aforementioned Whisper models and show how well-designed data processing pipelines and advanced decoding techniques can improve the ASR performance in terms of Mixed Error Rate (MER) of 0.310, Character Error Rate (CER) of 0.329 and Word Error Rate (WER) of 0.538.

artificial intelligence, machine learning, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

2411.13424

Country: Africa > Middle East > Algeria (0.36)

Genre: Research Report > New Finding (1.00)

Industry:

Media (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Automatic Pronunciation Assessment -- A Review

Kheir, Yassine El, Ali, Ahmed, Chowdhury, Shammur Absar

arXiv.org Artificial IntelligenceOct-21-2023

Pronunciation assessment and its application in computer-aided pronunciation training (CAPT) have seen impressive progress in recent years. With the rapid growth in language processing and deep learning over the past few years, there is a need for an updated review. In this paper, we review methods employed in pronunciation assessment for both phonemic and prosodic. We categorize the main challenges observed in prominent research trends, and highlight existing limitations, and available resources. This is followed by a discussion of the remaining challenges and possible directions for future work.

machine learning, natural language, pronunciation, (19 more...)

arXiv.org Artificial Intelligence

2310.13974

Country:

Asia > Middle East (0.14)
Asia > China (0.14)
North America > United States (0.14)
(2 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

L1-aware Multilingual Mispronunciation Detection Framework

Kheir, Yassine El, Chowdhury, Shammur Absar, Ali, Ahmed

arXiv.org Artificial IntelligenceSep-21-2023

The phonological discrepancies between a speaker's native (L1) and the non-native language (L2) serves as a major factor for mispronunciation. This paper introduces a novel multilingual MDD architecture, L1-MultiMDD, enriched with L1-aware speech representation. An end-to-end speech encoder is trained on the input signal and its corresponding reference phoneme sequence. First, an attention mechanism is deployed to align the input audio with the reference phoneme sequence. Afterwards, the L1-L2-speech embedding are extracted from an auxiliary model, pretrained in a multi-task setup identifying L1 and L2 language, and are infused with the primary network. Finally, the L1-MultiMDD is then optimized for a unified multilingual phoneme recognition task using connectionist temporal classification (CTC) loss for the target languages: English, Arabic, and Mandarin. Our experiments demonstrate the effectiveness of the proposed L1-MultiMDD framework on both seen -- L2-ARTIC, LATIC, and AraVoiceL2v2; and unseen -- EpaDB and Speechocean762 datasets. The consistent gains in PER, and false rejection rate (FRR) across all target languages confirm our approach's robustness, efficacy, and generalizability.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2309.07719

Country:

Africa (0.46)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

The complementary roles of non-verbal cues for Robust Pronunciation Assessment

Kheir, Yassine El, Chowdhury, Shammur Absar, Ali, Ahmed

arXiv.org Artificial IntelligenceSep-14-2023

Numerous investigations have explored a range of features and modeling approaches aimed at enhancing modeling Research on pronunciation assessment systems focuses performance. These explorations have encompassed the utilization on utilizing phonetic and phonological aspects of non-native of Goodness-of-Pronunciation (GOP) metrics [4, 5, (L2) speech, often neglecting the rich layer of information 6], the integration of manually crafted handful of non-verbal hidden within the non-verbal cues. In this study, we proposed features such as duration, energy, and pitch [7, 8, 9], as well a novel pronunciation assessment framework, IntraVerbalPA.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2309.07739

Country: Asia > Middle East > Qatar (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MyVoice: Arabic Speech Resource Collaboration Platform

Elshahawy, Yousseif, Kheir, Yassine El, Chowdhury, Shammur Absar, Ali, Ahmed

arXiv.org Artificial IntelligenceJul-23-2023

We introduce MyVoice, a crowdsourcing platform designed to collect Arabic speech to enhance dialectal speech technologies. This platform offers an opportunity to design large dialectal speech datasets; and makes them publicly available. MyVoice allows contributors to select city/country-level fine-grained dialect and record the displayed utterances. Users can switch roles between contributors and annotators. The platform incorporates a quality assurance system that filters out low-quality and spurious recordings before sending them for validation. During the validation phase, contributors can assess the quality of recordings, annotate them, and provide feedback which is then reviewed by administrators. Furthermore, the platform offers flexibility to admin roles to add new data or tasks beyond dialectal speech and word collection, which are displayed to contributors. Thus, enabling collaborative efforts in gathering diverse and large Arabic speech data.

artificial intelligence, contributor, social media, (16 more...)

arXiv.org Artificial Intelligence

2308.02503

Country: Asia > Middle East > Qatar (0.16)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Speech (0.32)

Add feedback

SpeechBlender: Speech Augmentation Framework for Mispronunciation Data Generation

Kheir, Yassine El, Chowdhury, Shammur Absar, Ali, Ahmed, Mubarak, Hamdy, Afzal, Shazia

arXiv.org Artificial IntelligenceJul-12-2023

The lack of labeled second language (L2) speech data is a major challenge in designing mispronunciation detection models. We introduce SpeechBlender - a fine-grained data augmentation pipeline for generating mispronunciation errors to overcome such data scarcity. The SpeechBlender utilizes varieties of masks to target different regions of phonetic units, and use the mixing factors to linearly interpolate raw speech signals while augmenting pronunciation. The masks facilitate smooth blending of the signals, generating more effective samples than the `Cut/Paste' method. Our proposed technique achieves state-of-the-art results, with Speechocean762, on ASR dependent mispronunciation detection models at phoneme level, with a 2.0% gain in Pearson Correlation Coefficient (PCC) compared to the previous state-of-the-art [1]. Additionally, we demonstrate a 5.0% improvement at the phoneme level compared to our baseline. We also observed a 4.6% increase in F1-score with Arabic AraVoiceL2 testset.

artificial intelligence, machine learning, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2211.00923

Country: Asia > Middle East > Qatar (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Benchmarking Arabic AI with Large Language Models

Abdelali, Ahmed, Mubarak, Hamdy, Chowdhury, Shammur Absar, Hasanain, Maram, Mousi, Basel, Boughorbel, Sabri, Kheir, Yassine El, Izham, Daniel, Dalvi, Fahim, Hawasly, Majd, Nazar, Nizi, Elshahawy, Yousseif, Ali, Ahmed, Durrani, Nadir, Milic-Frayling, Natasa, Alam, Firoj

arXiv.org Artificial IntelligenceMay-24-2023

With large Foundation Models (FMs), language technologies (AI in general) are entering a new paradigm: eliminating the need for developing large-scale task-specific datasets and supporting a variety of tasks through set-ups ranging from zero-shot to few-shot learning. However, understanding FMs capabilities requires a systematic benchmarking effort by comparing FMs performance with the state-of-the-art (SOTA) task-specific models. With that goal, past work focused on the English language and included a few efforts with multiple languages. Our study contributes to ongoing research by evaluating FMs performance for standard Arabic NLP and Speech processing, including a range of tasks from sequence tagging to content classification across diverse domains. We start with zero-shot learning using GPT-3.5-turbo, Whisper, and USM, addressing 33 unique tasks using 59 publicly available datasets resulting in 96 test setups. For a few tasks, FMs performs on par or exceeds the performance of the SOTA models but for the majority it under-performs. Given the importance of prompt for the FMs performance, we discuss our prompt strategies in detail and elaborate on our findings. Our future work on Arabic AI will explore few-shot prompting, expand the range of tasks, and investigate additional open-source models.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.14982

Country:

North America (1.00)
Europe (1.00)
Africa > Middle East (1.00)
Asia > Middle East > Iraq (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Media > News (1.00)
Health & Medicine (1.00)
Government (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

QVoice: Arabic Speech Pronunciation Learning Application

Kheir, Yassine El, Khnaisser, Fouad, Chowdhury, Shammur Absar, Mubarak, Hamdy, Afzal, Shazia, Ali, Ahmed

arXiv.org Artificial IntelligenceMay-9-2023

This paper introduces a novel Arabic pronunciation learning application QVoice, powered with end-to-end mispronunciation detection and feedback generator module. The application is designed to support non-native Arabic speakers in enhancing their pronunciation skills, while also helping native speakers mitigate any potential influence from regional dialects on their Modern Standard Arabic (MSA) pronunciation. QVoice employs various learning cues to aid learners in comprehending meaning, drawing connections with their existing knowledge of English language, and offers detailed feedback for pronunciation correction, along with contextual examples showcasing word usage. The learning cues featured in QVoice encompass a wide range of meaningful information, such as visualizations of phrases/words and their translations, as well as phonetic transcriptions and transliterations. QVoice provides pronunciation feedback at the character level and assesses performance at the word level.

artificial intelligence, natural language, pronunciation, (13 more...)

arXiv.org Artificial Intelligence

2305.07445

Country: Asia > Middle East > Qatar (0.16)

Genre: Research Report (0.40)

Industry: Education (0.71)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback