ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

Yan, Brian, Shi, Jiatong, Tang, Yun, Inaguma, Hirofumi, Peng, Yifan, Dalmia, Siddharth, Polák, Peter, Fernandes, Patrick, Berrebbi, Dan, Hayashi, Tomoki, Zhang, Xiaohui, Ni, Zhaoheng, Hira, Moto, Maiti, Soumi, Pino, Juan, Watanabe, Shinji

Jul-6-2023–arXiv.org Artificial Intelligence

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community. ESPnet-ST-v2 supports 1) offline speech-to-text translation (ST), 2) simultaneous speech-to-text translation (SST), and 3) offline speech-to-speech translation (S2ST) -- each task is supported with a wide variety of approaches, differentiating ESPnet-ST-v2 from other open source spoken language translation toolkits. This toolkit offers state-of-the-art architectures such as transducers, hybrid CTC/attention, multi-decoders with searchable intermediates, time-synchronous blockwise CTC/attention, Translatotron models, and direct discrete unit models. In this paper, we describe the overall design, example models for each task, and performance benchmarking behind ESPnet-ST-v2, which is publicly available at https://github.com/espnet/espnet.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

Jul-6-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > Minnesota (0.28)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Machine Translation (1.00)
  - Speech (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found