SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level

Tee, Hitomi Jin Ling, Wang, Chaoren, Zhang, Zijie, Wu, Zhizheng

Oct-31-2025–arXiv.org Artificial Intelligence

ABSTRACT The evaluation of intelligibility for TTS has reached a bottleneck, as existing assessments heavily rely on word-by-word accuracy metrics such as WER, which fail to capture the complexity of real-world speech or reflect human comprehension needs. To address this, we propose SP-MCQA (Spoken-Passage Multiple-Choice Question Answering), a novel subjective approach evaluating the accuracy of key information in synthesized speech, and release SP-MCQA-Eval, an 8.76-hour news-style benchmark dataset for SP-MCQA evaluation. Our experiments reveal that low WER does not necessarily guarantee high key-information accuracy, exposing a gap between traditional metrics and practical intelligibility. SP-MCQA shows that even state-of-the-art (SOT A) models still lack robust text normalization and phonetic accuracy. This work underscores the urgent need for high-level, more life-like evaluation criteria now that many systems already excel at WER yet may fall short on real-world intelligibility.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-31-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Alabama > Talladega County > Talladega (0.14)

Genre:
- Research Report (0.50)

Industry:
- Education (0.37)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.70)
  - Machine Learning > Neural Networks (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found