Reviews: Deep Voice 2: Multi-Speaker Neural Text-to-Speech
–Neural Information Processing Systems
This paper presents a solid piece of work on the speaker-dependent neural TTS system, building on previous works of Deep Voice and Tacotron architecture. The key idea is to learn a speaker-dependent embedding vector jointly with the neural TTS model. The paper is clearly written, and the experiments are presented well. My comments are as follows. ASR researchers later find that using fixed speaker embeddings such i-vectors can work equally well (or even better).
Neural Information Processing Systems
Oct-8-2024, 08:47:20 GMT
- Technology: