Semi-Supervised Learning Based on Reference Model for Low-resource TTS

Zhang, Xulong, Wang, Jianzong, Cheng, Ning, Xiao, Jing

Oct-25-2022–arXiv.org Artificial Intelligence

Most previous neural text-to-speech (TTS) methods are mainly based on supervised learning methods, which means they depend on a large training dataset and hard to achieve comparable performance under low-resource conditions. To address this issue, we propose a semi-supervised learning method for neural TTS in which labeled target data is limited, which can also resolve the problem of exposure bias in the previous auto-regressive models. Specifically, we pre-train the reference model based on Fastspeech2 with much source data, fine-tuned on a limited target dataset. Meanwhile, pseudo labels generated by the original reference model are used to guide the fine-tuned model's training further, achieve a regularization effect, and reduce the overfitting of the fine-tuned model during training on the limited target data. Experimental results show that our proposed semi-supervised learning scheme with limited target data significantly improves the voice quality for test data to achieve naturalness and robustness in speech synthesis.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Oct-25-2022

arXiv.org PDF

Add feedback

Country:
- Asia > China > Guangdong Province > Shenzhen (0.04)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Unsupervised or Indirectly Supervised Learning (1.00)
  - Inductive Learning (1.00)
  - Neural Networks > Deep Learning (0.68)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found