Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis