GMU Systems for the IWSLT 2025 Low-Resource Speech Translation Shared Task
Meng, Chutong, Anastasopoulos, Antonios
–arXiv.org Artificial Intelligence
This paper describes the GMU systems for the IWSLT 2025 low-resource speech translation shared task. We trained systems for all language pairs, except for Levantine Arabic. We fine-tuned SeamlessM4T-v2 for automatic speech recognition (ASR), machine translation (MT), and end-to-end speech translation (E2E ST). The ASR and MT models are also used to form cascaded ST systems. Additionally, we explored various training paradigms for E2E ST fine-tuning, including direct E2E fine-tuning, multi-task training, and parameter initialization using components from fine-tuned ASR and/or MT models. Our results show that (1) direct E2E fine-tuning yields strong results; (2) initializing with a fine-tuned ASR encoder improves ST performance on languages SeamlessM4T-v2 has not been trained on; (3) multi-task training can be slightly helpful.
arXiv.org Artificial Intelligence
May-29-2025
- Country:
- Asia
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Middle East > Republic of Türkiye
- Europe
- Austria > Vienna (0.14)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States > Florida
- Miami-Dade County > Miami (0.04)
- Canada > Ontario
- South America > Peru (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.68)
- Technology: