Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Khurana, Sameer, Dawalatabad, Nauman, Laurent, Antoine, Vicente, Luis, Gimeno, Pablo, Mingote, Victoria, Glass, James
–arXiv.org Artificial Intelligence
Research in multilingual speech-to-text translation is topical. Having a single model that supports multiple translation tasks is desirable. The goal of this work it to improve cross-lingual transfer learning in multilingual speech-to-text translation via semantic knowledge distillation. We show that by initializing the encoder of the encoder-decoder sequence-to-sequence translation model with SAMU-XLS-R, a multilingual speech transformer encoder trained using multi-modal (speech-text) semantic knowledge distillation, we achieve significantly better cross-lingual task knowledge transfer than the baseline XLS-R, a multilingual speech transformer encoder trained via self-supervised learning. We demonstrate the effectiveness of our approach on two popular datasets, namely, CoVoST-2 and Europarl. On the 21 translation tasks of the CoVoST-2 benchmark, we achieve an average improvement of 12.8 BLEU points over the baselines. In the zero-shot translation scenario, we achieve an average gain of 18.8 and 11.9 average BLEU points on unseen medium and low-resource languages. We make similar observations on Europarl speech translation benchmark.
arXiv.org Artificial Intelligence
Jan-25-2024
- Country:
- Europe (1.00)
- North America > United States
- California > San Francisco County
- San Francisco (0.14)
- Michigan (0.14)
- California > San Francisco County
- Genre:
- Research Report (0.52)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Machine Translation (1.00)
- Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence