Why Our Crazy-Smart AI Still Sucks at Transcribing Speech - Artificial Intelligence Online

#artificialintelligence

In an age when technology companies routinely introduce new forms of everyday magic, one problem that remains seemingly unsolved is that of long-form transcription. Sure, voice dictation for documents has been conquered by Nuance's Dragon software. Our phones and smart home devices can understand fairly complex commands, thanks to self-teaching recurrent neural nets and other 21st century wonders. However, the task of providing accurate transcriptions of long blocks of actual human conversation remains beyond the abilities of even today's most advanced software. When solved on a broad scale, it is a problem that might unlock vast archives of oral histories, make podcasts easier to consume for speed-readers (tl;dl), and be a world-changing boon for journalists everywhere, liberating precious hours of sweet life.


Why Our Crazy-Smart AI Still Sucks at Transcribing Speech

AITopics Original Links

In an age when technology companies routinely introduce new forms of everyday magic, one problem that remains seemingly unsolved is that of long-form transcription. Sure, voice dictation for documents has been conquered by Nuance's Dragon software. Our phones and smart home devices can understand fairly complex commands, thanks to self-teaching recurrent neural nets and other 21st century wonders. However, the task of providing accurate transcriptions of long blocks of actual human conversation remains beyond the abilities of even today's most advanced software. When solved on a broad scale, it is a problem that might unlock vast archives of oral histories, make podcasts easier to consume for speed-readers (tl;dl), and be a world-changing boon for journalists everywhere, liberating precious hours of sweet life.


Why Our Crazy-Smart AI Still Sucks at Transcribing Speech

WIRED

In an age when technology companies routinely introduce new forms of everyday magic, one problem that remains seemingly unsolved is that of long-form transcription. Sure, voice dictation for documents has been conquered by Nuance's Dragon software. Our phones and smart home devices can understand fairly complex commands, thanks to self-teaching recurrent neural nets and other 21st century wonders. However, the task of providing accurate transcriptions of long blocks of actual human conversation remains beyond the abilities of even today's most advanced software. When solved on a broad scale, it is a problem that might unlock vast archives of oral histories, make podcasts easier to consume for speed-readers (tl;dl), and be a world-changing boon for journalists everywhere, liberating precious hours of sweet life.


The long quest for technology that understands speech as well as a human

#artificialintelligence

Sitting in his office overlooking downtown Bellevue, Washington, Microsoft's Fil Alleva is talking about the long and sometimes difficult road he and other speech recognition experts have taken from the early work of the 1970s to the situation he is in today, where he can turn to his computer and say, "Cortana, I want a pizza" and get results. The conversation quickly drifts deeply into the technology that makes something like that possible, and then Alleva pauses. "What we all had in the back of our minds, whether we say it or not, was C-3PO," he admits with a grin. The personable "Star Wars" character who can understand and speak millions of languages may not have been the only inspiration for the world's leading researchers – some also will say that the universal translator that was featured prominently in "Star Trek" spurred their dreams along. But regardless of whether they were "Star Wars" fans or "Star Trek" loyalists, one thing is clear: The quest to create a computer that can understand spoken language as well as a person was for years so fanciful that the only thing to compare it to was science fiction.


Microsoft built technology that's better than a human at understanding a conversation

#artificialintelligence

Back row, left to right: Wayne Xiong, Geoffrey Zweig, Frank Seide. In December 2015, Microsoft Chief Scientist of Speech Xuedong Huang told Business Insider that "in the next four to five years, computers will be as good as humans" at understanding the words that come out of your mouth. Less than a year later, and Microsoft just set a record with the announcement of a system that can transcribe the contents of a phone call with "the same or fewer errors" than real actual human professionals trained in transcription. It's a huge milestone for speech recognition, even as gadgets like Amazon Echo and Apple's Airpods prove that voice is going to play a big role in the future of technology. And by Huang's standard, that's mission accomplished.