Powerful and Extensible WFST Framework for RNN-Transducer Losses
Laptev, Aleksandr, Bataev, Vladimir, Gitman, Igor, Ginsburg, Boris
–arXiv.org Artificial Intelligence
This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss. Existing implementations of RNN-T use CUDA-related code, which is hard to extend and debug. WFSTs are easy to construct and extend, and allow debugging through visualization. We introduce two WFST-powered RNN-T implementations: (1) "Compose-Transducer", based on a composition of the WFST graphs from acoustic and textual schema -- computationally competitive and easy to modify; (2) "Grid-Transducer", which constructs the lattice directly for further computations -- most compact, and computationally efficient. We illustrate the ease of extensibility through introduction of a new W-Transducer loss -- the adaptation of the Connectionist Temporal Classification with Wild Cards. W-Transducer (W-RNNT) consistently outperforms the standard RNN-T in a weakly-supervised data setup with missing parts of transcriptions at the beginning and end of utterances. All RNN-T losses are implemented with the k2 framework and are available in the NeMo toolkit.
arXiv.org Artificial Intelligence
Mar-18-2023
- Country:
- Asia > Russia (0.04)
- Europe
- Greece (0.04)
- Russia > Northwestern Federal District
- Leningrad Oblast > Saint Petersburg (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- North America > United States
- Rhode Island (0.04)
- Genre:
- Research Report (0.40)
- Technology: