Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision

Yang, Chih-Kai, Huang, Kuan-Po, Lu, Ke-Han, Kuan, Chun-Yi, Hsiao, Chi-Yuan, Lee, Hung-yi

Dec-30-2023–arXiv.org Artificial Intelligence

This work evaluated several cutting-edge large-scale foundation models based on self-supervision or weak supervision, including SeamlessM4T, SeamlessM4T v2, and Whisper-large-v3, on three code-switched corpora. We found that self-supervised models can achieve performances close to the supervised model, indicating the effectiveness of multilingual self-supervised pre-training. We also observed that these models still have room for improvement as they kept making similar mistakes and had unsatisfactory performances on modeling intra-sentential code-switching. In addition, the validity of several variants of Whisper was explored, and we concluded that they remained effective in a code-switching scenario, and similar techniques for self-supervised models are worth studying to boost the performance of code-switched tasks.

large language model, machine learning, whisper, (18 more...)

arXiv.org Artificial Intelligence

Dec-30-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.84)
  - Speech > Speech Recognition (1.00)