Neural Voice Cloning with a Few Samples
Arik, Sercan, Chen, Jitong, Peng, Kainan, Ping, Wei, Zhou, Yanqi
–Neural Information Processing Systems
Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model.
Neural Information Processing Systems
Feb-14-2020, 20:57:19 GMT