Review for NeurIPS paper: Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Jan-24-2025, 18:18:02 GMT–Neural Information Processing Systems

Weaknesses: I was a little confused about how the grouped 1x1 convolutions interact with the coupling layers. If the standard (half-and-half) partitioning is used for the coupling layers and the grouped 1x1 convolutions never mix channels outside of their group of 4, then half of the channels will never be transformed by any coupling layer. I'm assuming the authors deal with this issue somehow (since the results are good), but I only briefly scanned the code and didn't want to work through all of the index gymnastics. I could see readers being confused by these missing details. Update: In their response, the authors said they will explain more of the details of the grouped 1x1 convolutions in their revised version.

generative flow, monotonic alignment search, vocoder, (7 more...)

Neural Information Processing Systems

Jan-24-2025, 18:18:02 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision > Optical Character Recognition (0.40)
  - Speech > Speech Synthesis (0.40)
  - Assistive Technologies (0.40)