AITopics | multi-speaker generative model

Neural Voice Cloning with a Few Samples

Neural Information Processing SystemsMar-16-2026, 20:26:23 GMT

Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model. In terms of naturalness of the speech and similarity to the original speaker, both approaches can achieve good performance, even with a few cloning audios. While speaker adaptation can achieve slightly better naturalness and similarity, cloning time and required memory for the speaker encoding approach are significantly less, making it more favorable for low-resource deployment.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Neural Voice Cloning with a Few Samples

Sercan Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou

Neural Information Processing SystemsFeb-12-2026, 17:28:44 GMT

Neural Information Processing Systems http://nips.cc/

adaptation, generative model, speaker adaptation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia (0.04)

Industry: Information Technology > Security & Privacy (0.53)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Neural Voice Cloning with a Few Samples

Neural Information Processing SystemsNov-20-2025, 22:07:54 GMT

Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model. In terms of naturalness of the speech and similarity to the original speaker, both approaches can achieve good performance, even with a few cloning audios. While speaker adaptation can achieve slightly better naturalness and similarity, cloning time and required memory for the speaker encoding approach are significantly less, making it more favorable for low-resource deployment.

multi-speaker generative model, name change, neural voice cloning, (3 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Neural Voice Cloning with a Few Samples

Sercan Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou

Neural Information Processing SystemsNov-20-2025, 16:03:29 GMT

V oice cloning is a highly desired feature for personalized speech interfaces.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia (0.04)

Industry: Information Technology > Security & Privacy (0.53)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Neural Voice Cloning with a Few Samples

Arik, Sercan, Chen, Jitong, Peng, Kainan, Ping, Wei, Zhou, Yanqi

Neural Information Processing SystemsFeb-14-2020, 20:57:19 GMT

Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model.

multi-speaker generative model, neural voice cloning, speaker adaptation, (1 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Neural Voice Cloning with a Few Samples

Arik, Sercan, Chen, Jitong, Peng, Kainan, Ping, Wei, Zhou, Yanqi

Neural Information Processing SystemsDec-31-2018

Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model. In terms of naturalness of the speech and similarity to the original speaker, both approaches can achieve good performance, even with a few cloning audios. While speaker adaptation can achieve slightly better naturalness and similarity, cloning time and required memory for the speaker encoding approach are significantly less, making it more favorable for low-resource deployment.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America (0.28)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Neural Voice Cloning with a Few Samples

Arik, Sercan, Chen, Jitong, Peng, Kainan, Ping, Wei, Zhou, Yanqi

Neural Information Processing SystemsDec-31-2018

Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model. In terms of naturalness of the speech and similarity to the original speaker, both approaches can achieve good performance, even with a few cloning audios. While speaker adaptation can achieve slightly better naturalness and similarity, cloning time and required memory for the speaker encoding approach are significantly less, making it more favorable for low-resource deployment.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America (0.28)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Synced Baidu AI Can Clone Your Voice in Seconds

@machinelearnbotFeb-22-2018, 19:20:13 GMT

Baidu's research arm announced yesterday that its 2017 text-to-speech (TTS) system Deep Voice has learned how to imitate a person's voice using a mere three seconds of voice sample data. The technique, known as voice cloning, could be used to personalize virtual assistants such as Apple's Siri, Google Assistant, Amazon Alexa; and Baidu's Mandarin virtual assistant platform DuerOS, which supports 50 million devices in China with human-machine conversational interfaces. In healthcare, voice cloning has helped patients who lost their voices by building a duplicate. Voice cloning may even find traction in the entertainment industry and in social media as a tool for satirists. Baidu researchers implemented two approaches: speaker adaption and speaker encoding.

baidu, machine learning, natural language, (8 more...)

@machinelearnbot

Country: Asia > China (0.26)

Industry: Information Technology > Security & Privacy (0.90)

Technology: