WaveNet launches in the Google Assistant DeepMind
To understand why WaveNet improves on the current state of the art, it is useful to understand how text-to-speech (TTS) - or speech synthesis - systems work today. The majority of these are based on so-called concatenative TTS, which uses a large database of high-quality recordings, collected from a single voice actor over many hours. These recordings are split into tiny chunks that can then be combined - or concatenated - to form complete utterances as needed. However, these systems can result in unnatural sounding voices and are also difficult to modify because a whole new database needs to be recorded each time a set of changes, such as new emotions or intonations, are needed. To overcome some of these problems, an alternative model known as parametric TTS is sometimes used.
Oct-5-2017, 17:05:05 GMT
- Technology: