ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

Xiao, Yujia, Zhang, Shaofei, Wang, Xi, Tan, Xu, He, Lei, Zhao, Sheng, Soong, Frank K., Lee, Tan

Oct-7-2023–arXiv.org Artificial Intelligence

While state-of-the-art Text-to-Speech systems can generate natural speech of very high quality at sentence level, they still meet great challenges in speech generation for paragraph / long-form reading. Such deficiencies are due to i) ignorance of cross-sentence contextual information, and ii) high computation and memory cost for long-form synthesis. To address these issues, this work develops a lightweight yet effective TTS system, ContextSpeech. Specifically, we first design a memory-cached recurrence mechanism to incorporate global text and speech context into sentence encoding. Then we construct hierarchically-structured textual semantics to broaden the scope for global context enhancement. Additionally, we integrate linearized self-attention to improve model efficiency. Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency. Audio samples are available at: https://contextspeech.github.io/demo/

contextspeech, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Oct-7-2023

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Natural Language (1.00)
  - Speech > Speech Synthesis (0.75)
  - Vision > Optical Character Recognition (0.62)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found