Microsoft's VALL-E AI can mimic any voice from a short audio sample
Microsoft has shown off its latest research in text-to-speech AI with a model called VALL-E that can simulate someone's voice from just a three-second audio sample, Ars Technica has reported. The speech can not only match the timbre but also the emotional tone of the speaker, and even the acoustics of a room. It could one day be used for customized or high-end text-to-speech applications, though like deepfakes, it carries risks of misuse. VALL-E is what Microsoft calls a "neural codec language model." It's derived from Meta's AI-powered compression neural net Encodec, generating audio from text input and short samples from the target speaker.
Jan-10-2023, 11:25:20 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Vision (0.83)
- Speech > Speech Synthesis (0.63)
- Machine Learning > Neural Networks (0.58)
- Information Technology > Artificial Intelligence