Microsoft Launched VALL-E, A Voice DALL-E

#artificialintelligence 

Microsoft has recently released VALL-E, a new language model for text-to-speech synthesis (TTS) that uses audio codec codes to represent intermediate representations. After being trained on 60,000 hours worth of English speech data, it demonstrated in-context learning abilities in zero-shot situations. VALL-E allows you to create high-quality, personalized speech with just a 3-second recording of an oblique speaker as an acoustic prompt. It allows for prompt-based TTS techniques that are zero-shot and contextual. There is no need to add structural engineering or pre-designed acoustic features.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found