The Morning After: Microsoft's VALL-E AI can replicate a voice from a three-second sample

Engadget 

While there are already multiple services that can create copies of your voice, they usually demand substantial input. Microsoft claims its model can simulate someone's voice from just a three-second audio sample. The speech can match both the timbre and emotional tone of the speaker – even the acoustics of a room. It could one day be used for customized or high-end text-to-speech applications, but like deepfakes, there are risks of misuse. Researchers trained VALL-E on 60,000 hours of English language speech from 7,000-plus speakers in Meta's Libri-Light audio library.