Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System

Yoshimura, Takenori, Takaki, Shinji, Nakamura, Kazuhiro, Oura, Keiichiro, Hono, Yukiya, Hashimoto, Kei, Nankaku, Yoshihiko, Tokuda, Keiichi

Nov-21-2022–arXiv.org Artificial Intelligence

This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform models in the proposed system, both voice characteristics and the pitch of synthesized speech are highly controlled via a frequency warping parameter and fundamental frequency, respectively. We implement the mel-cepstral synthesis filter as a differentiable and GPU-friendly module to enable the acoustic and waveform models in the proposed system to be simultaneously optimized in an end-to-end manner. Experiments show that the proposed system improves speech quality from a baseline system maintaining controllability. The core PyTorch modules used in the experiments will be publicly available on GitHub.

artificial intelligence, machine learning, synthesis filter, (17 more...)

arXiv.org Artificial Intelligence

Nov-21-2022

arXiv.org PDF

Add feedback

Country:
- Asia
  - Vietnam > Hanoi
    - Hanoi (0.04)
  - Japan > Honshū
    - Chūbu > Aichi Prefecture > Nagoya (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Synthesis (0.95)
  - Machine Learning > Neural Networks
    - Deep Learning (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found