StressTransfer: Stress-Aware Speech-to-Speech Translation with Emphasis Preservation
Chen, Xi, Song, Yuchen, Nakamura, Satoshi
–arXiv.org Artificial Intelligence
EmphST -Bench To guide algorithm exploration and evaluate the performance of our model, we design an evaluation pipeline for the emphasis-preserving speech-to-speech translation system. Given the lack of ready-to-use benchmarks for this important task, we leverage LLMs to translate the test set from the StressTest [21] corpus into the target language and then filter the results via human experts. This process creates a high-quality benchmark dataset, EmphST -Bench, with manually verified emphasis alignments between source and target utterances, ensuring reliable assessment of cross-lingual emphasis preservation. The human filtering step focuses on correcting any discrepancies in semantic equivalence, contrastive focus, and emotional intensity, resulting in a robust evaluation set that closely mirrors real-world linguistic nuances. EmphST -Bench consists of carefully selected parallel samples from English (source) to Chinese (target), providing a standardized resource for evaluating stress-aware S2ST systems. We report the statistics of EmphST -Bench in Table. 1. T able 1: Statistics of the EmphST -Bench dataset.Statistic V alue Number of Samples 218 Avg.
arXiv.org Artificial Intelligence
Oct-16-2025
- Country:
- Asia
- Europe > Denmark
- Capital Region > Copenhagen (0.04)
- Genre:
- Research Report (0.64)
- Technology: