Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis

Open in new window