Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Open in new window