AITopics | Weinstein, Eugene

Collaborating Authors

Weinstein, Eugene

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Kannan, Anjuli, Datta, Arindrima, Sainath, Tara N., Weinstein, Eugene, Ramabhadran, Bhuvana, Wu, Yonghui, Bapna, Ankur, Chen, Zhifeng, Lee, Seungji

arXiv.org Machine LearningSep-11-2019

Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic speech recognition (ASR) coverage of the world's languages. They have shown improvement over monolingual systems, and have simplified training and serving by eliminating language-specific acoustic, pronunciation, and language models. This work presents an E2E multilingual system which is equipped to operate in low-latency interactive applications, as well as handle a key challenge of real world data: the imbalance in training data across languages. Using nine Indic languages, we compare a variety of techniques, and find that a combination of conditioning on a language vector and training language-specific adapter layers produces the best model. The resulting E2E multilingual model achieves a lower word error rate (WER) than both monolingual E2E models (eight of nine languages) and monolingual conventional systems (all nine languages). Index T erms: speech recognition, multilingual, RNN-T, residual adapter 1. Introduction Automatic speech recognition (ASR) systems that can transcribe speech in multiple languages, known as multilingual models, have gained popularity as an effective way to expand ASR coverage of the world's languages. Through shared learning of model elements across languages, they have been shown to outperform monolingual systems, particularly for those languages with less data.

deep learning, neural network, speech recognition, (18 more...)

arXiv.org Machine Learning

1909.0533

Country: Asia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Multilingual Speech Recognition With A Single End-To-End Model

Toshniwal, Shubham, Sainath, Tara N., Weiss, Ron J., Li, Bo, Moreno, Pedro, Weinstein, Eugene, Rao, Kanishka

arXiv.org Artificial IntelligenceFeb-15-2018

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages.

deep learning, neural network, speech recognition, (17 more...)

arXiv.org Artificial Intelligence

1711.01694

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback