Deep Voice 2: Multi-Speaker Neural Text-to-Speech

Andrew Gibiansky, Sercan Arik, Gregory Diamos, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou

Nov-21-2025, 12:48:28 GMT–Neural Information Processing Systems

We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-of-the-art approaches for single-speaker neural TTS: Deep V oice 1 and Tacotron.

artificial intelligence, machine learning, oice 2, (17 more...)

Neural Information Processing Systems

Nov-21-2025, 12:48:28 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.04)
- North America > United States
  - California
    - Santa Clara County > Sunnyvale (0.04)
    - Los Angeles County > Long Beach (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Synthesis (0.88)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Deep Voice 2: Multi-Speaker Neural Text-to-Speech

Similar Docs Excel Report more

Title	Similarity	Source
None found