On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models