Google's voice recognition software is nearing human-level accuracy, Mary Meeker said in her annual Internet Trends Report, delivered at the Code Conference today at the Terranea Resort in California. Google is now able to understand human language with 95 percent accuracy, thanks to machine learning algorithms that can detect speech and respond with meaningful results. The improvement has occurred at a rapid pace. Since 2013, accuracy has improved nearly 20 percent, according to Meeker's slide. And as Google continues to include voice recognition in more of its products, like Google Translate and its mobile and Home voice-powered assistants, the company is moving toward a future where talking to our machines will one day be as seamless as talking to friend.
Google has been working on refining and strengthening reCAPTCHA for years, a Turing test-based methodology for proving that website users aren't robots. It typically challenges users that it thinks might be bots by asking them to read distorted text and type it into a box, or to select groups of pictures that have something in common. But audio challenges are offered as an option for people with disabilities; these consist of sequences of recorded voices. Users are simply asked to type in what they hear. Last year, the University of Maryland team cracked the audio mechanism with unCAPTCHA, which combines free, public, online speech-to-text engines, including Google's own, with a phonetic mapping technique.
At I/O 2017, Sundar Pichai noted that computers are getting better at understanding voice input, with Google having achieved "significant breakthroughs" in speech recognition. In fact, Google's machine learning systems are now nearly on par with humans. According to Mary Meeker's annual Internet Trends Report, Google's machine learning-backed voice recognition -- as of May 2017 -- has achieved a 95% word accuracy rate for the English language. That current rate also happens to be the threshold for human accuracy. Quantifying Google's progress, accuracy has improved nearly 20% since 2013.
OpenFace is a Python and Torch implementation of face recognition with deep neural networks and is based on the CVPR 2015 paper FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff, Dmitry Kalenichenko, and James Philbin at Google. Torch allows the network to be executed on a CPU or with CUDA. This research was supported by the National Science Foundation (NSF) under grant number CNS-1518865. Additional support was provided by the Intel Corporation, Google, Vodafone, NVIDIA, and the Conklin Kistler family fund. Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and should not be attributed to their employers or funding sources.