One Whisper to Grade Them All
Phan, Nhan, Porwal, Anusha, Getman, Yaroslav, Voskoboinik, Ekaterina, Grósz, Tamás, Kurimo, Mikko
–arXiv.org Artificial Intelligence
We present an efficient end-to-end approach for holistic Automatic Speaking Assessment (ASA) of multi-part second-language tests, developed for the 2025 Speak & Improve Challenge. Our system's main novelty is the ability to process all four spoken responses with a single Whisper-small encoder, combine all information via a lightweight aggregator, and predict the final score. This architecture removes the need for transcription and per-part models, cuts inference time, and makes ASA practical for large-scale Computer-Assisted Language Learning systems. Our system achieved a Root Mean Squared Error (RMSE) of 0.384, outperforming the text-based baseline (0.44) while using at most 168M parameters (about 70% of Whisper-small). Furthermore, we propose a data sampling strategy, allowing the model to train on only 44.8% of the speakers in the corpus and still reach 0.383 RMSE, demonstrating improved performance on imbalanced classes and strong data efficiency.
arXiv.org Artificial Intelligence
Oct-7-2025
- Country:
- Asia > Indonesia (0.04)
- Europe
- Finland (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Oxfordshire > Oxford (0.04)
- North America > Canada
- Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Education > Curriculum > Subject-Specific Education (0.35)
- Technology: