Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

Mošner, Ladislav, Wu, Minhua, Raju, Anirudh, Parthasarathi, Sree Hari Krishnan, Kumatani, Kenichi, Sundaram, Shiva, Maas, Roland, Hoffmeister, Björn

arXiv.org Machine Learning 

In this work, we adopt the teacherstudent (T/S)learning technique using a parallel clean and noisy corpus for improving automatic speech recognition (ASR) performance under multimedia noise. On top of that, we apply a logits selection method which only preserves the k highest values to prevent wrong emphasis of knowledge from the teacher and to reduce bandwidth needed for transferring data. We incorporate up to 8000 hours of untranscribed data for training and present our results on sequence trained models apartfrom cross entropy trained ones. The best sequence trained student model yields relative word error rate (WER) reductions of approximately 10.1%, 28.7% and 19.6% on our clean, simulated noisy and real test sets respectively comparing toa sequence trained teacher. Index Terms-- automatic speech recognition, noise robustness, teacher-studenttraining, domain adaptation 1. INTRODUCTION With the exponential growth of big data and computing power, automatic speech recognition (ASR) technology has been successfully used in many applications. People can do voice search using mobile devices.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found