Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
Sun, Ling, Zhu, Charlotte, Shi, Shuju
–arXiv.org Artificial Intelligence
General-purpose ASR underperforms for atypical speakers, such as L2 learners, reinforcing bias and limiting use in education and accessibility. Using the CEFR-graded Speak and Improve corpus, we show that naive fine-tuning of Whisper reduces average WER but simultaneously widens disparities and disproportionately harms lower-level learners. To address this, we propose two strategies: (i) proficiency-aware multitask learning, jointly optimizing ASR with proficiency classification, and (ii) targeted augmentation, applying spectrogram masking to low-proficiency speech to counter imbalance. These approaches reduce WER by up to 29.4 percent (relative) and insertion/deletion errors by as much as 58.6 percent (relative). Crucially, despite the severe imbalance of the dataset reflecting real-world distributions, both strategies consistently narrow proficiency gaps, advancing equitable ASR for L2 learners.
arXiv.org Artificial Intelligence
Oct-14-2025
- Country:
- Europe
- France (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- North America > United States
- Indiana (0.04)
- Europe
- Genre:
- Research Report > New Finding (0.47)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Speech > Speech Recognition (0.98)
- Information Technology > Artificial Intelligence