JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT
Ohta, Mayumi, Kreutzer, Julia, Riezler, Stefan
–arXiv.org Artificial Intelligence
JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automatic speech recognition and end-to-end speech translation. It inherits the core philosophy of JoeyNMT, a minimalist NMT toolkit built on PyTorch, seeking simplicity and accessibility. JoeyS2T's workflow is self-contained, starting from data pre-processing, over model training and prediction to evaluation, and is seamlessly integrated into JoeyNMT's compact and simple code base. On top of JoeyNMT's state-of-the-art Transformer-based encoder-decoder architecture, JoeyS2T provides speech-oriented components such as convolutional layers, SpecAugment, CTC-loss, and WER evaluation. Despite its simplicity compared to prior implementations, JoeyS2T performs competitively on English speech recognition and English-to-German speech translation benchmarks. The implementation is accompanied by a walk-through tutorial and available on https://github.com/may-/joeys2t.
arXiv.org Artificial Intelligence
Oct-5-2022
- Country:
- Africa > Senegal (0.04)
- North America
- Dominican Republic (0.04)
- United States
- Pennsylvania (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Canada > British Columbia
- Europe
- Spain (0.04)
- Germany > Berlin (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Genre:
- Research Report (0.40)
- Technology: