Reviews: Deep Voice 2: Multi-Speaker Neural Text-to-Speech

Oct-8-2024, 08:47:20 GMT–Neural Information Processing Systems

This paper presents a solid piece of work on the speaker-dependent neural TTS system, building on previous works of Deep Voice and Tacotron architecture. The key idea is to learn a speaker-dependent embedding vector jointly with the neural TTS model. The paper is clearly written, and the experiments are presented well. My comments are as follows. ASR researchers later find that using fixed speaker embeddings such i-vectors can work equally well (or even better).

deep voice 2, multi-speaker neural text-to-speech, speaker adaptation, (5 more...)

Neural Information Processing Systems

Oct-8-2024, 08:47:20 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision > Optical Character Recognition (0.40)
  - Speech > Speech Synthesis (0.40)
  - Assistive Technologies (0.40)