CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages

Oliveira, Frederico S., Casanova, Edresson, Júnior, Arnaldo Cândido, Soares, Anderson S., Filho, Arlindo R. Galvão

Jun-16-2023–arXiv.org Artificial Intelligence

In this paper, we present CML-TTS, a recursive acronym for CML-Multi-Lingual-TTS, a new Text-to-Speech (TTS) dataset developed at the Center of Excellence in Artificial Intelligence (CEIA) of the Federal University of Goias (UFG). CML-TTS is based on Multilingual LibriSpeech (MLS) and adapted for training TTS models, consisting of audiobooks in seven languages: Dutch, French, German, Italian, Portuguese, Polish, and Spanish. Additionally, we provide the YourTTS model, a multi-lingual TTS model, trained using 3,176.13 hours from CML-TTS and also with 245.07 hours from LibriTTS, in English. Our purpose in creating this dataset is to open up new research possibilities in the TTS area for multi-lingual models. The dataset is publicly available under the CC-BY 4.0 license1.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jun-16-2023

arXiv.org PDF

Add feedback

Country:
- South America > Brazil (0.04)
- Europe
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Czechia > South Moravian Region
    - Brno (0.04)

Genre:
- Research Report (0.65)

Industry:
- Information Technology (0.47)
- Media (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Personal Assistant Systems (1.00)
  - Machine Learning > Neural Networks (1.00)
  - Speech > Speech Synthesis (0.75)
  - Natural Language
    - Chatbot (0.68)
    - Text Processing (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found