Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
Zhang, Yuhao, Xu, Chen, Hu, Bojie, Zhang, Chunliang, Xiao, Tong, Zhu, Jingbo
–arXiv.org Artificial Intelligence
We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a denoising method to build a robust text encoder that can deal with both normal and noisy text data. Our system sets new state-of-the-arts on the MuST-C En-De, En-Fr, and LibriSpeech En-Fr tasks.
arXiv.org Artificial Intelligence
Dec-4-2022
- Country:
- North America > United States
- Washington > King County > Seattle (0.04)
- Asia > China
- Liaoning Province > Shenyang (0.04)
- Beijing > Beijing (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Speech > Speech Recognition (1.00)
- Natural Language > Machine Translation (1.00)
- Machine Learning (1.00)
- Information Technology > Artificial Intelligence