Text-to-Speech: One Small Step by Mankind to Create Lifelike Robots
Note: For those of you who prefer watching videos, please feel free to play the above video on the same content. While speech synthesis has come a long way since Kratzenstein's vowel organ that could produce the five vowel sounds, it is a whole'nother level of challenge to transform text to natural-sounding speech. Recent developments in deep learning have provided us a new approach to the challenge and in this article, we shall briefly introduce a mainstream text-to-speech method before the deep learning era, then explore models like WaveNet that Google's text-to-speech API service is now using for lifelike speech synthesis. If you pause and think for a moment about how you can perform text-to-speech, you would probably formulate a method that is very similar to the concatenative approach. In concatenative text-to-speech, texts are broken down into smaller units such as phonemes, and the corresponding recordings of the units are then combined to form a complete speech.
Mar-27-2021, 17:45:14 GMT
- Technology: