Microsoft Launched VALL-E, A Voice DALL-E
Microsoft has recently released VALL-E, a new language model for text-to-speech synthesis (TTS) that uses audio codec codes to represent intermediate representations. After being trained on 60,000 hours worth of English speech data, it demonstrated in-context learning abilities in zero-shot situations. VALL-E allows you to create high-quality, personalized speech with just a 3-second recording of an oblique speaker as an acoustic prompt. It allows for prompt-based TTS techniques that are zero-shot and contextual. There is no need to add structural engineering or pre-designed acoustic features.
Jan-7-2023, 09:25:04 GMT
- Technology: