Letter-Based Speech Recognition with Gated ConvNets

Liptchinsky, Vitaliy, Synnaeve, Gabriel, Collobert, Ronan

Feb-15-2019–arXiv.org Artificial Intelligence

In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC). In contrast to traditional phone (or senone)-based approaches, these "end-to-end'' approaches alleviate the need of word pronunciation modeling, and do not require a "forced alignment" step at training time. Phone-based approaches remain however state of the art on classical benchmarks. In this paper, we propose a letter-based speech recognition system, leveraging a ConvNet acoustic model. Key ingredients of the ConvNet are Gated Linear Units and high dropout. The ConvNet is trained to map audio sequences to their corresponding letter transcriptions, either via a classical CTC approach, or via a recent variant called ASG. Coupled with a simple decoder at inference time, our system matches the best existing letter-based systems on WSJ (in word error rate), and shows near state of the art performance on LibriSpeech.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

Feb-15-2019

arXiv.org PDF

Add feedback

Country:
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.97)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found