Neural Voice Cloning: Teaching Machines to Generate Speech
At Baidu Research, we aim to revolutionize human-machine interfaces with the latest artificial intelligence techniques. Our Deep Voice project was started a year ago, which focuses on teaching machines to generate speech from text that sounds more human-like. Beyond single-speaker speech synthesis, we demonstrated that a single system could learn to reproduce thousands of speaker identities, with less than half an hour of training data for each speaker. This capability was enabled by learning shared and discriminative information from speakers. We were motivated to push this idea even further, and attempted to learn speaker characteristics from only a few utterances (i.e., sentences of few seconds duration).
Mar-6-2018, 14:16:20 GMT