GMU Systems for the IWSLT 2025 Low-Resource Speech Translation Shared Task

May-29-2025–arXiv.org Artificial Intelligence

This paper describes the GMU systems for the IWSLT 2025 low-resource speech translation shared task. We trained systems for all language pairs, except for Levantine Arabic. We fine-tuned SeamlessM4T-v2 for automatic speech recognition (ASR), machine translation (MT), and end-to-end speech translation (E2E ST). The ASR and MT models are also used to form cascaded ST systems. Additionally, we explored various training paradigms for E2E ST fine-tuning, including direct E2E fine-tuning, multi-task training, and parameter initialization using components from fine-tuned ASR and/or MT models. Our results show that (1) direct E2E fine-tuning yields strong results; (2) initializing with a fine-tuned ASR encoder improves ST performance on languages SeamlessM4T-v2 has not been trained on; (3) multi-task training can be slightly helpful.

artificial intelligence, machine translation, natural language, (18 more...)

arXiv.org Artificial Intelligence

May-29-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Middle East > Republic of Türkiye
    - Istanbul Province > Istanbul (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - United States > Florida
    - Miami-Dade County > Miami (0.04)
- South America > Peru (0.04)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (1.00)
  - Speech > Speech Recognition (1.00)