Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Andrew Gibiansky, Sercan Arik, Gregory Diamos, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
–Neural Information Processing Systems
We introduce a technique for augmenting neural text-to-speech (TTS) with low-dimensional trainable speaker embeddings to generate different voices from a single model. As a starting point, we show improvements over the two state-of-the-art approaches for single-speaker neural TTS: Deep V oice 1 and Tacotron.
Neural Information Processing Systems
Nov-21-2025, 12:48:28 GMT
- Country:
- Asia (0.04)
- North America > United States
- California
- Los Angeles County > Long Beach (0.04)
- Santa Clara County > Sunnyvale (0.04)
- California
- Genre:
- Research Report > Promising Solution (0.34)
- Technology: