Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23
Tsiamas, Ioannis, Gállego, Gerard I., Fonollosa, José A. R., Costa-jussà, Marta R.
–arXiv.org Artificial Intelligence
Gállego et al. (2021); Zhao et al. (2022) aimed to Han et al. (2021) tackled the issue by projecting speech and text features In the past decade, the field of Speech Translation (ST) has seen significant advancements, mainly In our work, we tackle the issue of misaligned due to end-to-end models that directly translate speech and text encoder representations by adopting speech, offering a more efficient method compared the approach proposed by Le et al. (2023). Despite data availability challenges, recent on English ASR, wav2vec 2.0 (Baevski et al., progress has diminished the performance disparity 2020), and an MT foundation model fine-tuned between these approaches (Bentivogli et al., 2021; on multilingual MT (En-Xx), mBART50 (Tang Potapczyk and Przybysz, 2020; Inaguma et al., et al., 2020), as described in Section 2.1.
arXiv.org Artificial Intelligence
Jun-2-2023
- Country:
- Asia (1.00)
- Europe (0.93)
- North America > United States
- Minnesota (0.14)
- Genre:
- Research Report
- Experimental Study (0.46)
- New Finding (0.46)
- Research Report
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Machine Translation (1.00)
- Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence