Google's new text-to-speech system sounds convincingly human
Get ready for the little person living inside your phone and speaker to sound a lot more life-like. Google believes it has reached a new milestone in the quest to make computer-generated speech indistinguishable from human speech with Tacotron 2, a system that trains neural networks to generate eerily natural-sounding speech from text, and they have the samples to prove it. In a research paper published earlier this month, though yet to be peer-reviewed, Google asserts that previous approaches to text-to-speech (TTS) systems have thus far failed to achieve a genuinely natural sound. Techniques such as concatenative synthesis, in which pre-recorded samples of speech are stitched together, and statistical parametric speech synthesis, Google says have been insufficient, explaining, "The audio produced by these systems often sounds muffled and unnatural compared to human speech." With Tacotron 2 (which is not the same as the world-ending super-weapon used by Lord Business), the company says it has incorporated ideas from its previous TTS systems, WaveNet and the first Tacotron, to reach a new level of fidelity.
Dec-28-2017, 14:26:21 GMT