Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation

Lam, Tsz Kin, Schamoni, Shigehiko, Riezler, Stefan

Mar-16-2022–arXiv.org Artificial Intelligence

End-to-end speech translation relies on data that pair source-language speech inputs with corresponding translations into a target language. Such data are notoriously scarce, making synthetic data augmentation by back-translation or knowledge distillation a necessary ingredient of end-to-end training. In this paper, we present a novel approach to data augmentation that leverages audio alignments, linguistic properties, and translation. First, we augment a transcription by sampling from a suffix memory that stores text and audio data. Second, we translate the augmented transcript. Finally, we recombine concatenated audio segments and the generated translation. Besides training an MT-system, we only use basic off-the-shelf components without fine-tuning. While having similar resource demands as knowledge distillation, adding our method delivers consistent improvements of up to 0.9 and 1.1 BLEU points on five language pairs on CoVoST 2 and on two language pairs on Europarl-ST, respectively.

covost 2, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Mar-16-2022

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - California > San Diego County
      - San Diego (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
- Europe
  - United Kingdom > England
    - East Sussex > Brighton (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Germany
    - Saxony > Dresden (0.04)
    - Berlin (0.04)
  - Czechia > South Moravian Region
    - Brno (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - Austria > Styria
    - Graz (0.04)
- Asia
  - China > Hong Kong (0.04)
  - South Korea > Seoul
    - Seoul (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language > Machine Translation (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found