Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
Andrusenko, Andrei, Laptev, Aleksandr, Bataev, Vladimir, Lavrukhin, Vitaly, Ginsburg, Boris
–arXiv.org Artificial Intelligence
Accurate recognition of rare and new words remains a pressing problem for contextualized Automatic Speech Recognition (ASR) systems. Most context-biasing methods involve modification of the ASR model or the beam-search decoding algorithm, complicating model reuse and slowing down inference. This work presents a new approach to fast context-biasing with CTC-based Word Spotter (CTC-WS) for CTC and Transducer (RNN-T) ASR models. The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates. The valid candidates then replace their greedy recognition counterparts in corresponding frame intervals. A Hybrid Transducer-CTC model enables the CTC-WS application for the Transducer model. The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER compared to baseline methods. The proposed method is publicly available in the NVIDIA NeMo toolkit.
arXiv.org Artificial Intelligence
Jun-11-2024
- Country:
- Europe > Switzerland (0.14)
- Genre:
- Research Report (0.70)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (0.70)
- Speech > Speech Recognition (0.72)
- Information Technology > Artificial Intelligence