End-to-end speech recognition with neon - Nervana

Feb-5-2017, 19:10:45 GMT–#artificialintelligence

Thus, given a sequence of frames corresponding to an utterance, the model is required to produce, for each frame, a probability distribution over the alphabet. During the training phase, the softmax outputs are fed into a CTC cost function (more on this shortly) which uses the actual transcripts to (i) score the model's predictions, and (ii) generate an error signal quantifying the accuracy of the model's predictions. The overall goal is to train the model to increase the overall score of its predictions relative to the actual transcripts. Empirically, we have found that using stochastic gradient descent with momentum paired with gradient clipping leads to the best performing models. Deeper networks (seven layers or more) also tend to perform better in general.

artificial intelligence, machine learning, sequence, (17 more...)

#artificialintelligence

Feb-5-2017, 19:10:45 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (0.70)
  - Machine Learning
    - Statistical Learning > Gradient Descent (0.55)
    - Neural Networks > Deep Learning (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found