SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain

Qi, Heli, Novitasari, Sashi, Tjandra, Andros, Sakti, Sakriani, Nakamura, Satoshi

Jan-7-2023–arXiv.org Artificial Intelligence

This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use. This first release focuses on the TTS-to-ASR chain, a core component of the machine speech chain, that refers to the TTS data augmentation by unspoken text for ASR. To build an efficient pipeline for the large-scale TTS-to-ASR chain, we implement easy-to-use multi-GPU batch-level model inference, multi-dataloader batch generation, and on-the-fly data selection techniques. In this paper, we first explain the overall procedure of the TTS-to-ASR chain and the difficulties of each step. Then, we present a detailed ablation study on different types of unlabeled data, data filtering thresholds, batch composition, and real-synthetic data ratios. Our experimental results on train_clean_460 of LibriSpeech demonstrate that our TTS-to-ASR chain can significantly improve WER in a semi-supervised setting.

artificial intelligence, machine learning, tts asr chain, (12 more...)

arXiv.org Artificial Intelligence

Jan-7-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Slovenia
  - Drava > Municipality of Benedikt > Benedikt (0.04)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (0.51)
  - Machine Learning
    - Unsupervised or Indirectly Supervised Learning (0.35)
    - Neural Networks > Deep Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found