A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-Supervision

Yuksel, Kamer Ali, Ferreira, Thiago, Gunduz, Ahmet, Al-Badrashiny, Mohamed, Javadi, Golara

Jun-21-2023–arXiv.org Artificial Intelligence

ABSTRACT The common standard for quality evaluation of automatic speech recognition (ASR) systems is reference-based metrics such as the Word Error Rate (WER), computed using manual ground-truth transcriptions that are time-consuming and expensive to obtain. This work proposes a multi-language referenceless quality metric, which allows comparing the performance of different ASR models on a speech dataset without ground truth transcriptions. To estimate the quality of ASR hypotheses, a pre-trained language model (LM) is fine-tuned with contrastive learning in a self-supervised learning manner. In experiments conducted on several unseen test datasets consisting of outputs from top commercial ASR engines in various languages, the proposed referenceless metric obtains a much higher correlation with WER scores and their ranks than the perplexity metric from the state-of-art multi-lingual LM in all experiments, and also reduces WER Figure 1. NoRefER fine-tunes a pre-trained language model in by more than 7% when used for ensembling hypotheses.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-21-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Santa Clara County > Los Gatos (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language (1.00)
  - Machine Learning > Performance Analysis
    - Accuracy (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found