AITopics | high-quality speech

Collaborating Authors

high-quality speech

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models

wu, Weihao, Lin, Zhiwei, Zhou, Yixuan, Li, Jingbei, Niu, Rui, Wu, Qinghua, Cao, Songjun, Ma, Long, Wu, Zhiyong

arXiv.org Artificial IntelligenceFeb-27-2025

Conversational speech synthesis (CSS) aims to synthesize both contextually appropriate and expressive speech, and considerable efforts have been made to enhance the understanding of conversational context. However, existing CSS systems are limited to deterministic prediction, overlooking the diversity of potential responses. Moreover, they rarely employ language model (LM)-based TTS backbones, limiting the naturalness and quality of synthesized speech. To address these issues, in this paper, we propose DiffCSS, an innovative CSS framework that leverages diffusion models and an LM-based TTS backbone to generate diverse, expressive, and contextually coherent speech. A diffusion-based context-aware prosody predictor is proposed to sample diverse prosody embeddings conditioned on multimodal conversational context. Then a prosody-controllable LM-based TTS backbone is developed to synthesize high-quality speech with sampled prosody embeddings. Experimental results demonstrate that the synthesized speech from DiffCSS is more diverse, contextually coherent, and expressive than existing CSS systems

conversational context, prosody, speech, (12 more...)

arXiv.org Artificial Intelligence

2502.19924

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > China > Hong Kong (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.89)

Add feedback

IBM's AI generates high-quality voices from 5 minutes of talking

#artificialintelligenceOct-1-2019, 04:51:12 GMT

Training powerful text to speech models requires sufficiently powerful hardware. A recent study published by OpenAI drives the point home -- it found that since 2012, the amount of compute used in the largest runs grew by more than 300,000 times. In pursuit of less demanding models, researchers at IBM developed a new lightweight and modular method for speech synthesis. They say it's able to synthesize high-quality speech in real time by learning different aspects of a speaker's voice, making it possible to adapt to new speaking styles and voices with small amounts of data. "Recent advances in deep learning are dramatically improving the development of Text-to-Speech (TTS) systems through more effective and efficient learning of voice and speaking styles of speakers and more natural generation of high-quality output speech," wrote IBM researchers Zvi Kons, Slava Shechtman, and Alex Sorin in a blog post accompanying a preprint paper presented at Interspeech 2019.

ai generate high-quality voice, high-quality speech, speech synthesis, (7 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.58)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.97)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback