Emotion-Aligned Generation in Diffusion Text to Speech Models via Preference-Guided Optimization