Unsupervised Polyglot Text To Speech

Nachmani, Eliya, Wolf, Lior

arXiv.org Machine Learning 

ABSTRACT We present a TTS neural network that is able to produce speech in multiple languages. The proposed network is able to transfer a voice, which was presented as a sample in a source language, into one of several target languages. The conversion is based on learning a polyglot network that has multiple perlanguage sub-networksand adding loss terms that preserve the speaker's identity in multiple languages. We evaluate the proposed polyglot neural network for three languages with a total of more than 400 speakers and demonstrate convincing conversion capabilities. Index Terms-- TTS, multilingual, unsupervised learning 1. INTRODUCTION Neural text to speech (TTS) is an emerging technology that is becoming dominant over the alternative TTS technologies, in both quality and flexibility.

