Reviews: FastSpeech: Fast, Robust and Controllable Text to Speech
–Neural Information Processing Systems
Originally: Although phoneme duration prediction is widely adopted in conventional TTS systems, jointly training it in a neural TTS model is new. This paper is one of the first works on non-autoregressive text-to-spectrogram modeling. Quality: This paper seems sound overall, expected for a few issues in the comments below. Some of these issues must be addressed before acceptance. Clarity: A well written paper. Significance: The advantages over its autoregressive counterparts are significant, especially for industrial use.
Neural Information Processing Systems
Jun-1-2025, 23:53:13 GMT
- Technology: