MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting

Shah, Neil, Tambrahalli, Vishal, Kosgi, Saiteja, Pedanekar, Niranjan, Gandhi, Vineet

May-19-2023–arXiv.org Artificial Intelligence

We present MParrotTTS, a unified multilingual, multi-speaker text-to-speech (TTS) synthesis model that can produce high-quality speech. Benefiting from a modularized training paradigm exploiting self-supervised speech representations, MParrotTTS adapts to a new language with minimal supervised data and generalizes to languages not seen while training the self-supervised backbone. Moreover, without training on any bilingual or parallel examples, MParrotTTS can transfer voices across languages while preserving the speaker-specific characteristics, e.g., synthesizing fluent Hindi speech using a French speaker's voice and accent. We present extensive results on six languages in terms of speech naturalness and speaker similarity in parallel and cross-lingual synthesis. The proposed model outperforms the state-of-the-art multilingual TTS models and baselines, using only a small fraction of supervised training data. Speech samples from our model can be found at https://paper2438.github.io/tts/

artificial intelligence, machine learning, representation, (13 more...)

arXiv.org Artificial Intelligence

May-19-2023

arXiv.org PDF

Add feedback

Country:
- Asia > India (0.28)
- North America > United States
  - Minnesota (0.28)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.93)
  - Speech > Speech Synthesis (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found