SAGE: Spliced-Audio Generated Data for Enhancing Foundational Models in Low-Resource Arabic-English Code-Switched Speech Recognition

Jun-30-2025–arXiv.org Artificial Intelligence

ABSTRACT This paper investigates the performance of various speech SSL models on dialectal Arabic (DA) and Arabic-English code-switched (CS) speech. To address data scarcity, a modified audio-splicing approach is introduced to generate artificial CS speech data. Fine-tuning an already fine-tuned SSL model with the proposed Spliced-Audio Generated (SAGE) data results in an absolute improvement on Word Error Rate (WER) of 7.8% on Arabic and English CS benchmarks. Additionally, an Experience Replay (ER) inspired approach is proposed to enhance generalisation across DA and CS speech while mitigating catastrophic forgetting. Integrating an out-of-domain 3-gram language model reduces the overall mean WER from 31.7% to 26.6%. Few-shot fine-tuning for code-switching benchmarks further improves WER by 4.9%.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jun-30-2025

arXiv.org PDF

Add feedback

Country:
- Africa
  - Middle East (0.04)
  - North Africa (0.04)
- Asia
  - East Asia (0.04)
  - Middle East (0.04)
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
- Europe
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Middle East (0.04)
- North America > United States
  - Florida > Miami-Dade County > Miami (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language (1.00)
  - Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found