Emotion-Aligned Generation in Diffusion Text to Speech Models via Preference-Guided Optimization

Open in new window