Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder

Open in new window