Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual Speech Recognition Evaluation

Srivastav, Vaibhav, Zheng, Steven, Bezzam, Eric, Bihan, Eustache Le, Moumen, Adel, Gandhi, Sanchit

Dec-11-2025–arXiv.org Artificial Intelligence

Despite rapid progress, ASR evaluation remains saturated with short-form English, and efficiency is rarely reported. We present the Open ASR Leaderboard, a fully reproducible benchmark and interactive leaderboard comparing 60+ open-source and proprietary systems across 11 datasets, including a dedicated multilingual track. We standardize text normalization and report both word error rate (WER) and inverse real-time factor (RTFx), enabling fair accuracy-efficiency comparisons. For English transcription, Conformer encoders paired with LLM decoders achieve the best average WER but are slower, while CTC and TDT decoders deliver much better RTFx, making them attractive for long-form and offline use. Whisper-derived encoders fine-tuned for English improve accuracy but often trade off multilingual coverage. All code and dataset loaders are open-sourced to support transparent, extensible evaluation.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Dec-11-2025

arXiv.org PDF

Add feedback

Country:
- Europe
  - France > Île-de-France
    - Paris > Paris (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)
    - Greater London > London (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language (1.00)
  - Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found