online sequence-to-sequence model
An Online Sequence-to-Sequence Model Using Partial Conditioning
Sequence-to-sequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences. This is because they generate an output sequence conditioned on an entire input sequence. In this paper, we present a Neural Transducer that can make incremental predictions as more input arrives, without redoing the entire computation. Unlike sequence-to-sequence models, the Neural Transducer computes the next-step distribution conditioned on the partially observed input sequence and the partially generated sequence.
An Online Sequence-to-Sequence Model Using Partial Conditioning
Sequence-to-sequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences. This is because they generate an output sequence conditioned on an entire input sequence. In this paper, we present a Neural Transducer that can make incremental predictions as more input arrives, without redoing the entire computation. Unlike sequence-to-sequence models, the Neural Transducer computes the next-step distribution conditioned on the partially observed input sequence and the partially generated sequence.
Reviews: An Online Sequence-to-Sequence Model Using Partial Conditioning
This is a well-done paper. It attacks a problem that is worthwhile: how to construct and train a sequence-to-sequence model that can operate on-line instead of waiting for an entire input to be received. It clearly describes an architecture for solving the problem, and walks the reader through the issues in the design of each component in the architecture: next-step prediction, the attention mechanism, and modeling the ends of blocks. It clearly explains the challenges that need to be overcome train the model and perform inference with it, and proposes reasonable approximate algorithms for training and inference. The speech recognition experiments used to demonstrate the utility of the transducer model and to explore design issues such as maintenance of recurrent state across block boundaries, block size, design of the attention mechanism, and depth of the model are reasonable.
An Online Sequence-to-Sequence Model Using Partial Conditioning
Jaitly, Navdeep, Le, Quoc V., Vinyals, Oriol, Sutskever, Ilya, Sussillo, David, Bengio, Samy
Sequence-to-sequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences. This is because they generate an output sequence conditioned on an entire input sequence. In this paper, we present a Neural Transducer that can make incremental predictions as more input arrives, without redoing the entire computation. Unlike sequence-to-sequence models, the Neural Transducer computes the next-step distribution conditioned on the partially observed input sequence and the partially generated sequence.
An online sequence-to-sequence model for noisy speech recognition
Chiu, Chung-Cheng, Lawson, Dieterich, Luo, Yuping, Tucker, George, Swersky, Kevin, Sutskever, Ilya, Jaitly, Navdeep
Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners. Recent innovations in Deep Learning have given rise to an alternative - discriminative models called Sequence-to-Sequence models, that can almost match the accuracy of state of the art generative models. While these models are easy to train as they can be trained end-to-end in a single step, they have a practical limitation that they can only be used for offline recognition. This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition. To address this problem, online sequence-to-sequence models were recently introduced. These models are able to start producing outputs as data arrives, and the model feels confident enough to output partial transcripts. These models, like sequence-to-sequence are causal - the output produced by the model until any time, $t$, affects the features that are computed subsequently. This makes the model inherently more powerful than generative models that are unable to change features that are computed from the data. This paper highlights two main contributions - an improvement to online sequence-to-sequence model training, and its application to noisy settings with mixed speech from two speakers.
- North America > United States > California (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)