DeepMind Unveils WaveNet - A Deep Neural Network for Speech and Audio Synthesis
Google's DeepMind announced the WaveNet project, a fully convolutional, probabilistic and autoregressive deep neural network. It synthesizes new speech and music from audio and sounds more natural than the best existing Text-To-Speech (TTS) systems, according to DeepMind. Speech synthesis is largely based on concatenative TTS, where a database of short speech fragments are recorded from a single speaker and recombined to form speech. This approach isn't flexible and can't be adjusted to new voice inputs easily, often resulting in the need to completely rebuild a dataset when there's a desire to drastically alter existing voice properties. DeepMind notes that while previous models typically hinge around a large audio dataset from a single input source, or single person, WaveNet retains its models as sets of parameters that can be modified based on new input to an existing model.
Sep-20-2016, 10:35:53 GMT