Exploring Sequence-to-Sequence Transformer-Transducer Models for Keyword Spotting
Labrador, Beltrán, Zhao, Guanlong, Moreno, Ignacio López, Scarpati, Angelo Scorza, Fowl, Liam, Wang, Quan
–arXiv.org Artificial Intelligence
In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task. We achieve this by replacing the keyword in the text transcription with a special token and training the system to detect the token in an audio stream. At inference time, we create a decision function inspired by conventional KWS approaches, to make our approach more suitable for the KWS task. Furthermore, we introduce a specific keyword spotting loss by adapting the sequence-discriminative Minimum Bayes-Risk training technique. We find that our approach significantly outperforms ASR based KWS systems. When compared with a conventional keyword spotting system, our proposal has similar performance while bringing the advantages and flexibility of sequence-to-sequence training. Additionally, when combined with the conventional KWS system, our approach can improve the performance at any operation point.
arXiv.org Artificial Intelligence
Nov-11-2022
- Country:
- Oceania > Australia (0.04)
- Asia > India (0.04)
- South America > Chile
- North America
- United States (0.04)
- Canada > Newfoundland and Labrador
- Labrador (0.04)
- Europe > Spain
- Genre:
- Research Report (1.00)
- Industry:
- Media (0.34)
- Technology: