RNN-Transducer-based Losses for Speech Recognition on Noisy Targets

Bataev, Vladimir

arXiv.org Artificial Intelligence 

We start with the template "CM3015 Machine Learning and Neural Networks, Theme 1: Deep Learning on a public dataset," which describes the task of choosing a publicly available dataset and training a deep learning model on it. So, we will work with a neural network-based end-to-end ASR system, using LibriSpeech [2] dataset, a popular academic benchmark. We limit our task to RNN-T ransducer [3] systems, which are widely used in production and provide state-of-the-art quality [4] in most cases. We are going beyond the standard task and focusing our research on making RNN-T ransducer systems robust to noisy targets: unlike well-curated datasets, in the industry, the training data contains different errors due to the unreliability of the transcription sources or the inability to transcribe noisy speech accurately . T o solve the problem of training on the noisy data, we will analyze the impact of different types of errors in training data on the quality of the RNN-T ransducer system and explore different loss modifications to overcome the problem. We will construct the artificial training data by mutating correct transcripts from the LibriSpeech [2] training part, similar to the approaches used in the related work, and try to achieve the best possible quality on the development and test data standard for LibriSpeech.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found