CMU's IWSLT 2024 Simultaneous Speech Translation System

Xu, Xi, Ouyang, Siqi, Yan, Brian, Fernandes, Patrick, Chen, William, Li, Lei, Neubig, Graham, Watanabe, Shinji

Aug-14-2024–arXiv.org Artificial Intelligence

This paper describes CMU's submission to the IWSLT 2024 Simultaneous Speech Translation (SST) task for translating English speech to German text in a streaming manner. Our end-to-end speech-to-text (ST) system integrates the WavLM speech encoder, a modality adapter, and the Llama2-7B-Base model as the decoder. We employ a two-stage training approach: initially, we align the representations of speech and text, followed by full fine-tuning. Both stages are trained on MuST-c v2 data with cross-entropy loss. We adapt our offline ST model for SST using a simple fixed hold-n policy. Experiments show that our model obtains an offline BLEU score of 31.1 and a BLEU score of 29.5 under 2 seconds latency on the MuST-C-v2 tst-COMMON.

iwslt 2024, speech, translation, (12 more...)

arXiv.org Artificial Intelligence

Aug-14-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.04)
- Europe
  - Italy (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (1.00)
  - Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found