Style-agnostic evaluation of ASR using multiple reference transcripts

McNamara, Quinten, Fernández, Miguel Ángel del Río, Bhandari, Nishchal, Ratajczak, Martin, Chen, Danny, Miller, Corey, Jetté, Migüel

Dec-10-2024–arXiv.org Artificial Intelligence

Word error rate (WER) as a metric has a variety of limitations that have plagued the field of speech recognition. Evaluation datasets suffer from varying style, formality, and inherent ambiguity of the transcription task. In this work, we attempt to mitigate some of these differences by performing style-agnostic evaluation of ASR systems using multiple references transcribed under opposing style parameters. As a result, we find that existing WER reports are likely significantly over-estimating the number of contentful errors made by state-of-the-art ASR systems. In addition, we have found our multireference method to be a useful mechanism for comparing the quality of ASR models that differ in the stylistic makeup of their training data and target task.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-10-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Pennsylvania (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
- Europe > Germany
  - Saxony > Leipzig (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Speech > Speech Recognition (0.90)
  - Machine Learning (0.72)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found