FastSpeech: Fast, Robust and Controllable Text to Speech

Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Feb-15-2026, 03:49:24 GMT–Neural Information Processing Systems

Prominent methods (e.g., Tacotron 2)usuallyfirst generate mel-spectrogram from text, and then synthesize speech from themel-spectrogram using vocoder such as WaveNet. Compared with traditionalconcatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech isusually not robust (i.e., some words are skipped or repeated) and lack of con-trollability (voice speed or prosody control).

artificial intelligence, fastspeech, machine learning, (15 more...)

Neural Information Processing Systems

Feb-15-2026, 03:49:24 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.05)
- North America > Canada
  - British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Duplicate Docs Excel Report

Title
FastSpeech: Fast, Robust and Controllable Text to Speech

Similar Docs Excel Report more

Title	Similarity	Source
None found