IBM's AI generates high-quality voices from 5 minutes of talking

#artificialintelligence 

Training powerful text to speech models requires sufficiently powerful hardware. A recent study published by OpenAI drives the point home -- it found that since 2012, the amount of compute used in the largest runs grew by more than 300,000 times. In pursuit of less demanding models, researchers at IBM developed a new lightweight and modular method for speech synthesis. They say it's able to synthesize high-quality speech in real time by learning different aspects of a speaker's voice, making it possible to adapt to new speaking styles and voices with small amounts of data. "Recent advances in deep learning are dramatically improving the development of Text-to-Speech (TTS) systems through more effective and efficient learning of voice and speaking styles of speakers and more natural generation of high-quality output speech," wrote IBM researchers Zvi Kons, Slava Shechtman, and Alex Sorin in a blog post accompanying a preprint paper presented at Interspeech 2019.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found