Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data

Duret, Jarod, Parcollet, Titouan, Estève, Yannick

Jun-29-2023–arXiv.org Artificial Intelligence

We propose a method for speech-to-speech emotionpreserving translation that operates at the level of discrete speech units. Our approach relies on the use of multilingual emotion embedding that can capture affective information in a language-independent manner. We show that this embedding can be used to predict the pitch and duration of speech units in a target language, allowing us to resynthesize the source speech signal with the same emotional content. We evaluate our approach to English and French speech signals and show that it outperforms a baseline method that does not use emotional information, including when the emotion embedding is extracted from a different language. Even if this preliminary study does not address directly the machine translation issue, our results demonstrate the effectiveness of our approach for cross-lingual emotion preservation in the context of speech resynthesis.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jun-29-2023

arXiv.org PDF

Add feedback

Country:
- Europe
  - France (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.14)

Genre:
- Research Report > New Finding (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Machine Translation (0.50)
  - Speech > Speech Recognition (0.48)
  - Machine Learning
    - Neural Networks (0.47)
    - Statistical Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found