Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training
Dong, Lukuan, Qin, Donghong, Bai, Fengbo, Song, Fanhua, Liu, Yan, Xu, Chen, Ou, Zhijian
–arXiv.org Artificial Intelligence
In our practice, it takes non-trivial efforts to collect and transcribe even less than 10 hours of Iu Mien language. The The mainstream automatic speech recognition (ASR) technology development of Iu Mien language speech recognition systems usually requires hundreds to thousands of hours of is very challenging, while it is very important to reduce digital annotated speech data. Three approaches to low-resourced divides and culture inheritance. ASR are phoneme or subword based supervised pre-training, The paradigm of pre-training (PT) followed by fine-tuning and self-supervised pre-training over multilingual data. The (FT), called the PTFT paradigm, has emerged in recent years as Iu Mien language is the main ethnic language of the Yao an effective way to solve the problem of limited training data for ethnic group in China and is low-resourced in the sense that low-resource languages for ASR. In pre-training, training data the annotated speech is very limited. With less than 10 hours for a number of languages are merged to train a multilingual of transcribed Iu Mien language, this paper investigates and model. The pre-trained model can then serve as a backbone, compares the three approaches for Iu Mien speech recognition.
arXiv.org Artificial Intelligence
Jul-18-2024