Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning
Özyilmaz, Ömer Tarik, Coler, Matt, Valdenegro-Toro, Matias
–arXiv.org Artificial Intelligence
Although commercial Arabic automatic speech recognition (ASR) systems support Modern Standard Arabic (MSA), they struggle with dialectal speech. We investigate the effect of fine-tuning OpenAI's Whisper on five major Arabic dialects (Gulf, Levantine, Iraqi, Egyptian, Maghrebi) using Mozilla Common V oice for MSA and the MASC dataset for dialectal speech. We evaluate MSA training size effects, benefits of pre-training on MSA data, and dialect-specific versus dialect-pooled models. We find that small amounts of MSA fine-tuning data yield substantial improvements for smaller models, matching larger non-fine-tuned models. While MSA pre-training shows minimal benefit, suggesting limited shared features between MSA and dialects, our dialect-pooled models perform comparably to dialect-specific ones. This indicates that pooling dialectal data, when properly balanced, can help address data scarcity in low-resource ASR without significant performance loss.
arXiv.org Artificial Intelligence
Sep-26-2025
- Country:
- Asia > Middle East
- Jordan (0.04)
- Palestine (0.04)
- Saudi Arabia (0.04)
- UAE (0.04)
- Europe
- Netherlands (0.05)
- United Kingdom (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.64)
- Industry:
- Health & Medicine (0.68)
- Technology: