StressTransfer: Stress-Aware Speech-to-Speech Translation with Emphasis Preservation

Chen, Xi, Song, Yuchen, Nakamura, Satoshi

arXiv.org Artificial Intelligence 

EmphST -Bench To guide algorithm exploration and evaluate the performance of our model, we design an evaluation pipeline for the emphasis-preserving speech-to-speech translation system. Given the lack of ready-to-use benchmarks for this important task, we leverage LLMs to translate the test set from the StressTest [21] corpus into the target language and then filter the results via human experts. This process creates a high-quality benchmark dataset, EmphST -Bench, with manually verified emphasis alignments between source and target utterances, ensuring reliable assessment of cross-lingual emphasis preservation. The human filtering step focuses on correcting any discrepancies in semantic equivalence, contrastive focus, and emotional intensity, resulting in a robust evaluation set that closely mirrors real-world linguistic nuances. EmphST -Bench consists of carefully selected parallel samples from English (source) to Chinese (target), providing a standardized resource for evaluating stress-aware S2ST systems. We report the statistics of EmphST -Bench in Table. 1. T able 1: Statistics of the EmphST -Bench dataset.Statistic V alue Number of Samples 218 Avg.