guided-tts
Guided-TTS:Text-to-Speech with Untranscribed Speech
Kim, Heeseung, Kim, Sungwon, Yoon, Sungroh
Most neural text-to-speech (TTS) models require
Guided-TTS: Text-to-Speech with Untranscribed Speech - Technology Org
Neural text-to-speech (TTS) models are successfully used to generate high-quality human-like speech. However, most TTS models can be trained if only the transcribed data of the desired speaker is given. That means that long-form untranscribed data, such as podcasts, cannot be used to train existing models. A recent paper on arXiv proposes an unconditional diffusion-based generative model. It is trained on untranscribed data that leverages a phoneme classifier for text-to-speech synthesis.