Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Kannan, Anjuli, Datta, Arindrima, Sainath, Tara N., Weinstein, Eugene, Ramabhadran, Bhuvana, Wu, Yonghui, Bapna, Ankur, Chen, Zhifeng, Lee, Seungji

Sep-11-2019–arXiv.org Machine Learning

Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic speech recognition (ASR) coverage of the world's languages. They have shown improvement over monolingual systems, and have simplified training and serving by eliminating language-specific acoustic, pronunciation, and language models. This work presents an E2E multilingual system which is equipped to operate in low-latency interactive applications, as well as handle a key challenge of real world data: the imbalance in training data across languages. Using nine Indic languages, we compare a variety of techniques, and find that a combination of conditioning on a language vector and training language-specific adapter layers produces the best model. The resulting E2E multilingual model achieves a lower word error rate (WER) than both monolingual E2E models (eight of nine languages) and monolingual conventional systems (all nine languages). Index T erms: speech recognition, multilingual, RNN-T, residual adapter 1. Introduction Automatic speech recognition (ASR) systems that can transcribe speech in multiple languages, known as multilingual models, have gained popularity as an effective way to expand ASR coverage of the world's languages. Through shared learning of model elements across languages, they have been shown to outperform monolingual systems, particularly for those languages with less data.

language vector, multilingual model, speech recognition, (13 more...)

arXiv.org Machine Learning

Sep-11-2019

arXiv.org PDF

Add feedback

Country:
- Europe > Romania
  - Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > India
  - Karnataka > Bengaluru (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.97)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found