Hypotheses Paradise: An Open and Strong Baseline for Speech Recognition with Large Language Models Chen Chen 1, Chao-Han Huck Yang

Feb-10-2025, 10:55:42 GMT–Neural Information Processing Systems

Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to attain human parity on several publicly available clean speech datasets. However, even state-of-the-art ASR systems experience performance degradation when confronted with adverse conditions, as a well-trained acoustic model is sensitive to variations in the speech domain, e.g., background noise. Intuitively, humans address this issue by relying on their linguistic knowledge: the meaning of ambiguous spoken terms is usually inferred from contextual cues thereby reducing the dependency on the auditory system. Inspired by this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction. This approach is a paradigm shift from the traditional language model rescoring strategy that can only select one candidate hypothesis as the output transcription.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Feb-10-2025, 10:55:42 GMT

Conferences PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.67)
- Asia > China (0.47)

Industry:
- Education (1.00)
- Materials > Chemicals
  - Commodity Chemicals > Petrochemicals > LNG (0.46)
- Energy > Oil & Gas
  - Midstream (0.92)

Duplicate Docs Excel Report

Title
6492267465a7ac507be1f9fd1174e78d-Supplemental-Datasets_and_Benchmarks.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found