VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation

Badlani, Rohan, Arora, Akshit, Ghosh, Subhankar, Valle, Rafael, Shih, Kevin J., Santos, João Felipe, Ginsburg, Boris, Catanzaro, Bryan

Mar-13-2023–arXiv.org Artificial Intelligence

We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Challenge, to synthesize speech in 3 different languages. Our model supports transferring the language of a speaker while retaining their voice and the native accent of the target language. We utilize the large-parameter RADMMM model for Track $1$ and lightweight VANI model for Track $2$ and $3$ of the competition.

artificial intelligence, cosine sim, speech synthesis, (14 more...)

arXiv.org Artificial Intelligence

Mar-13-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (0.70)
  - Speech > Speech Synthesis (0.57)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found