Soon You Won't Be Able to Tell an AI From a Human Voice

#artificialintelligence 

The choppy, cybernetic voices of digital assistants like Siri may not sound so mechanical for much longer, thanks to a significant breakthrough in using artificial intelligence to generate realistic human speech. In a new paper, scientists at Google-owned AI shop DeepMind have unveiled WaveNet, a neural network that generates audio waveforms by predicting and adapting to its own output in real-time. The result is dramatically more natural-sounding computerized speech, which the researchers say reduces the perceived gap between human and computer voices speaking both English and Chinese by over 50 percent. The system's predictive model is a far cry from the synthesized speech systems used by "digital assistant" apps like Siri. Instead of using a "concatenative" speech system that pieces together from a library of speech fragments recorded by one speaker (in Siri's case, voice actress Susan Bennett), WaveNet is trained on a massive database, then generates raw waveforms one audio sample at time using what's known as an "autoregressive" model--meaning each individual frame of the waveform is predicted based on the frames that preceded it. The neural net was developed from a similar model called PixelCNN, which does the same for computer vision by predicting images one pixel at a time.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found