Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data

Open in new window