Guided-TTS: Text-to-Speech with Untranscribed Speech - Technology Org


Neural text-to-speech (TTS) models are successfully used to generate high-quality human-like speech. However, most TTS models can be trained if only the transcribed data of the desired speaker is given. That means that long-form untranscribed data, such as podcasts, cannot be used to train existing models. A recent paper on arXiv proposes an unconditional diffusion-based generative model. It is trained on untranscribed data that leverages a phoneme classifier for text-to-speech synthesis.

Duplicate Docs Excel Report

None found

Similar Docs  Excel Report  more

None found