Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition
Kumar, Shankar, Nirschl, Michael, Holtmann-Rice, Daniel, Liao, Hank, Suresh, Ananda Theertha, Yu, Felix
ABSTRACT Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks. However, these models are computationally more expensive than N-gram LMs for decoding, and thus, challenging to integrate into speech recognizers. Recent research has proposed the use of lattice-rescoring algorithms using RNNLMs and LSTMLMs as an efficient strategy to integrate these models into a speech recognition system. In this paper, we evaluate existing lattice rescoring algorithms along with new variants on a Y ouTube speech recognition task. Lattice rescoring using LSTMLMs reduces the word error rate (WER) for this task by 8% relative to the WER obtained using an N-gram LM. Index Terms-- LSTM, language modeling, lattice rescoring, speech recognition 1. INTRODUCTION A language model (LM) is a crucial component of a statistical speech recognition system [1]. While this makes the N-gram LMs powerful for tasks such as voice-search where short-range contexts suffice, they do not perform as well at tasks such as transcription of long form speech content, that require modeling of long-range contexts [2].
Nov-15-2017
- Country:
- North America
- Canada > Ontario
- Toronto (0.14)
- United States > Massachusetts
- Middlesex County > Cambridge (0.04)
- Canada > Ontario
- North America
- Genre:
- Research Report (0.40)
- Technology: