Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models

Prabhavalkar, Rohit, Sainath, Tara N., Wu, Yonghui, Nguyen, Patrick, Chen, Zhifeng, Chiu, Chung-Cheng, Kannan, Anjuli

Dec-5-2017–arXiv.org Machine Learning

Sequence-to-sequence models, such as attention-based models in automatic speech recognition (ASR), are typically trained to optimize the cross-entropy criterion which corresponds to improving the log-likelihood of the data. However, system performance is usually measured in terms of word error rate (WER), not log-likelihood. Traditional ASR systems benefit from discriminative sequence training which optimizes criteria such as the state-level minimum Bayes risk (sMBR) which are more closely related to WER. In the present work, we explore techniques to train attention-based models to directly minimize expected word error rate. We consider two loss functions which approximate the expected number of word errors: either by sampling from the model, or by using N-best lists of decoded hypotheses, which we find to be more effective than the sampling-based method. In experimental evaluations, we find that the proposed training procedure improves performance by up to 8.2% relative to the baseline system. This allows us to train grapheme-based, uni-directional attention-based models which match the performance of a traditional, state-of-the-art, discriminative sequence-trained system on a mobile voice-search task.

deep learning, proc, speech recognition, (18 more...)

arXiv.org Machine Learning

Dec-5-2017

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.73)
    - Performance Analysis > Accuracy (0.83)
    - Statistical Learning (0.69)
  - Natural Language (1.00)
  - Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found