Microsoft's VALL-E AI can mimic any voice from a short audio sample

Jan-10-2023, 11:25:20 GMT–Engadget

Microsoft has shown off its latest research in text-to-speech AI with a model called VALL-E that can simulate someone's voice from just a three-second audio sample, Ars Technica has reported. The speech can not only match the timbre but also the emotional tone of the speaker, and even the acoustics of a room. It could one day be used for customized or high-end text-to-speech applications, though like deepfakes, it carries risks of misuse. VALL-E is what Microsoft calls a "neural codec language model." It's derived from Meta's AI-powered compression neural net Encodec, generating audio from text input and short samples from the target speaker.

artificial intelligence, machine learning, training data, (9 more...)

Engadget

Jan-10-2023, 11:25:20 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Vision (0.83)
  - Speech > Speech Synthesis (0.63)
  - Machine Learning > Neural Networks (0.58)