Neural Voice Cloning with a Few Samples

Arik, Sercan, Chen, Jitong, Peng, Kainan, Ping, Wei, Zhou, Yanqi

Feb-14-2020, 20:57:19 GMT–Neural Information Processing Systems

Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model.

multi-speaker generative model, neural voice cloning, speaker adaptation, (1 more...)

Neural Information Processing Systems

Feb-14-2020, 20:57:19 GMT

Conferences Web Page

Add feedback

Industry:
- Information Technology > Security & Privacy (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.93)
  - Machine Learning > Neural Networks (0.93)