AITopics | guided-tts

Collaborating Authors

guided-tts

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Guided-TTS:Text-to-Speech with Untranscribed Speech

Kim, Heeseung, Kim, Sungwon, Yoon, Sungroh

arXiv.org Artificial IntelligenceDec-7-2021

Most neural text-to-speech (TTS) models require paired data from the desired speaker for high-quality speech synthesis, which limits the usage of large amounts of untranscribed data for training. In this work, we present Guided-TTS, a high-quality TTS model that learns to generate speech from untranscribed speech data. Guided-TTS combines an unconditional diffusion probabilistic model with a separately trained phoneme classifier for text-to-speech. By modeling the unconditional distribution for speech, our model can utilize the untranscribed data for training. For text-to-speech synthesis, we guide the generative process of the unconditional DDPM via phoneme classification to produce mel-spectrograms from the conditional distribution given transcript. We show that Guided-TTS achieves comparable performance with the existing methods without any transcript for LJSpeech. Our results further show that a single speaker-dependent phoneme classifier trained on multispeaker large-scale data can guide unconditional DDPMs for various speakers to perform TTS.

classifier, guided-tts, phoneme classifier, (16 more...)

arXiv.org Artificial Intelligence

2111.11755

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Guided-TTS: Text-to-Speech with Untranscribed Speech - Technology Org

#artificialintelligenceNov-27-2021, 10:05:57 GMT

Neural text-to-speech (TTS) models are successfully used to generate high-quality human-like speech. However, most TTS models can be trained if only the transcribed data of the desired speaker is given. That means that long-form untranscribed data, such as podcasts, cannot be used to train existing models. A recent paper on arXiv proposes an unconditional diffusion-based generative model. It is trained on untranscribed data that leverages a phoneme classifier for text-to-speech synthesis.

artificial intelligence, optical character recognition, untranscribed speech, (11 more...)

#artificialintelligence

Genre: Research Report (0.36)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.98)
Information Technology > Artificial Intelligence > Assistive Technologies (0.98)

Add feedback