Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior

Open in new window