Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

Open in new window