An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

Gong, Cheng, Cooper, Erica, Wang, Xin, Qiang, Chunyu, Geng, Mengzhe, Wells, Dan, Wang, Longbiao, Dang, Jianwu, Tessier, Marc, Pine, Aidan, Richmond, Korin, Yamagishi, Junichi

Jun-13-2024–arXiv.org Artificial Intelligence

Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system proposed in our previous work. We conducted experiments on 12 languages using limited data with various fine-tuning configurations. We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance. Additionally, we find that the fine-tuning dataset size and number of speakers influence adaptability. Surprisingly, we also observed that using paired data for fine-tuning is not always optimal compared to audio-only data. Beyond speech intelligibility, our analysis covers speaker similarity, language identification, and predicted MOS.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jun-13-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.15)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.46)
  - Natural Language (1.00)
  - Speech (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found