EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models

de Seyssel, Maureen, D'Avirro, Antony, Williams, Adina, Dupoux, Emmanuel

Dec-21-2023–arXiv.org Artificial Intelligence

We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a change of speaker and language. As part of the evaluation pipeline, we introduce EmphaClass, a new model that classifies emphasis at the frame or word level.

emphasis, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-21-2023

arXiv.org PDF

Add feedback

Country:
- Asia > South Korea (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language (1.00)
  - Speech > Speech Recognition (0.90)