Review for NeurIPS paper: A Spectral Energy Distance for Parallel Speech Synthesis
–Neural Information Processing Systems
This paper proposes a strategy for parallel TTS based on spectral energy distance. It does not rely on explicit optimization of likelihood nor adversarial learning, which enjoys a more stable and consistent training. On top of that, the authors introduce a repulsive term which has shown to significantly improve the quality of the generated speech. When combined with adversarial training, the quality of speech can be further improved. Overall, this is an interesting work, technically solid and experimentally compelling.
Neural Information Processing Systems
Jan-26-2025, 20:11:51 GMT
- Technology: