Contextual-Utterance Training for Automatic Speech Recognition
Gomez-Alanis, Alejandro, Drude, Lukas, Schwarz, Andreas, Swaminathan, Rupak Vignesh, Wiesler, Simon
–arXiv.org Artificial Intelligence
Recent studies of streaming automatic speech recognition (ASR) recurrent neural network transducer (RNN-T)-based systems have fed the encoder with past contextual information in order to improve its word error rate (WER) performance. In this paper, we first propose a contextual-utterance training technique which makes use of the previous and future contextual utterances in order to do an implicit adaptation to the speaker, topic and acoustic environment. Also, we propose a dual-mode contextual-utterance training technique for streaming automatic speech recognition (ASR) systems. This proposed approach allows to make a better use of the available acoustic context in streaming models by distilling "in-place" the knowledge of a teacher, which is able to see both past and future contextual utterances, to the student which can only see the current and past contextual utterances. The experimental results show that a conformer-transducer system trained with the proposed techniques outperforms the same system trained with the classical RNN-T loss. Specifically, the proposed technique is able to reduce both the WER and the average last token emission latency by more than 6% and 40ms relative, respectively.
arXiv.org Artificial Intelligence
Oct-27-2022
- Country:
- North America
- United States > California
- San Diego County > San Diego (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States > California
- Europe
- United Kingdom > Scotland
- City of Edinburgh > Edinburgh (0.04)
- Ukraine > Lviv Oblast
- Lviv (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Germany > North Rhine-Westphalia
- Cologne Region > Aachen (0.04)
- Czechia > South Moravian Region
- Brno (0.04)
- Austria
- United Kingdom > Scotland
- Asia
- North America
- Genre:
- Research Report (1.00)
- Technology: