Baidu's Deep Voice can quickly synthesize realistic human speech
Google's WaveNet can also synthesize realistic human speech, but it's quite computationally demanding and hard to use for real-world applications at this point. Baidu says it solved WaveNet's problem by using deep-learning techniques to convert text to phenomes, the smallest unit of speech. It then turns those phonemes into sounds using its speech synthesis network. The system converts the word "hello," for instance, into "(silence HH), (HH, EH), (EH, L), (L, OW), (OW, silence)" before the speech network pronounces it. Both steps rely on deep learning and don't need human input.
Mar-9-2017, 08:00:06 GMT
- Technology: