You Sound a Little Tense: L2 Tailored Clear TTS Using Durational Vowel Properties

Tuttösí, Paige, Yeung, H. Henny, Wang, Yue, Aucouturier, Jean-Julien, Lim, Angelica

Sep-4-2025–arXiv.org Artificial Intelligence

We present the first text-to-speech (TTS) system tailored to second language (L2) speakers. We use duration differences between American English tense (longer) and lax (shorter) vowels to create a "clarity mode" for Matcha-TTS. Our perception studies showed that French-L1, English-L2 listeners the participants had fewer (at least 9.15%) transcription errors when using our clarity mode, and found it more encouraging and respectful than overall slowed down speech. Remarkably, listeners were not aware of these effects: despite the decreased word error rate in clarity mode, listeners still believed that slowing all target words was the most intelligible, suggesting that actual intelligibility does not correlate with perceived intelligibility. Additionally, we found that Whisper-ASR did not use the same cues as L2 speakers to differentiate difficult vowels and is not sufficient to assess the intelligibility of TTS systems for these individuals.

machine learning, natural language, vowel, (17 more...)

arXiv.org Artificial Intelligence

Sep-4-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.14)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (0.46)
  - Speech > Speech Synthesis (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found