The ML-SUPERB 2.0 Challenge: Towards Inclusive ASR Benchmarking for All Language Varieties
Chen, William, Meng, Chutong, Shi, Jiatong, Bartelds, Martijn, Wang, Shih-Heng, Wang, Hsiu-Hsuan, Mosquera, Rafael, Hincapie, Sara, Jurafsky, Dan, Anastasopoulos, Antonis, Lee, Hung-yi, Livescu, Karen, Watanabe, Shinji
–arXiv.org Artificial Intelligence
Recent improvements in multilingual ASR have not been equally distributed across languages and language varieties. To advance state-of-the-art (SOT A) ASR models, we present the Interspeech 2025 ML-SUPERB 2.0 Challenge. We construct a new test suite that consists of data from 200+ languages, accents, and dialects to evaluate SOT A multilingual speech models. The challenge also introduces an online evaluation server based on DynaBench, allowing for flexibility in model design and architecture for participants. The challenge received 5 submissions from 3 teams, all of which outperformed our baselines. The best-performing submission achieved an absolute improvement in LID accuracy of 23% and a reduction in CER of 18% when compared to the best baseline on a general multilingual test set. On accented and dialectal data, the best submission obtained 30.2% lower CER and 15.7% higher LID accuracy, showing the importance of community challenges in making speech technologies more inclusive.
arXiv.org Artificial Intelligence
Sep-10-2025
- Country:
- Africa > Middle East
- Morocco > Casablanca-Settat Region > Casablanca (0.04)
- Asia > Taiwan (0.04)
- North America > United States
- Illinois > Cook County
- Chicago (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Illinois > Cook County
- Africa > Middle East
- Genre:
- Research Report (0.50)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (0.94)
- Speech > Speech Recognition (0.98)
- Information Technology > Artificial Intelligence