Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

Fukuda, Ryo, Sudoh, Katsuhito, Nakamura, Satoshi

Jul-13-2022–arXiv.org Artificial Intelligence

Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also propose a hybrid method that combines VAD and the above speech segmentation method. Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods. The hybrid approach further improved the translation performance.

proceedings, segmentation, translation, (13 more...)

arXiv.org Artificial Intelligence

Jul-13-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America > United States
  - Massachusetts > Suffolk County
    - Boston (0.04)
  - Georgia > Fulton County
    - Atlanta (0.04)
- Europe > United Kingdom
  - England > Greater Manchester > Manchester (0.04)
- Asia
  - Japan (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - India > Karnataka
    - Bengaluru (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Acoustic Processing (1.00)
  - Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found