China's leading Internet-search company, Baidu, has developed a voice system that can recognize English and Mandarin speech better than people, in some cases. The new system, called Deep Speech 2, is especially significant in how it relies entirely on machine learning for translation. Whereas older voice-recognition systems include many handcrafted components to aid audio processing and transcription, the Baidu system learned to recognize words from scratch, simply by listening to thousands of hours of transcribed audio. The technology relies on a powerful technique known as deep learning, which involves training a very large multilayered virtual network of neurons to recognize patterns in vast quantities of data. The Baidu app for smartphones lets users search by voice, and also includes a voice-controlled personal assistant called Duer (see "Baidu's Duer Joins the Personal Assistant Party").
Amazon's voice assistant Alexa has become a hugely popular and growing business. In fact, David Limp, an Amazon senior vice president who oversees Alexa and all of its Amazon devices, says that Alexa is rapidly adding "skills," with more than 1,000 people working on it. On Tuesday, at Fortune's Brainstorm Tech conference, Limp spoke to Fortune's Adam Lashinsky about the inspiration for Alexa (hint: Think Star Trek) and the origin of the name to where the business is heading. Here is the lightly-edited transcript. Dave Limp: The device business is less about building hardware for customers and more about building services behind that hardware. So the original vision of Kindle was to deliver any book ever written in less than 60 seconds, and that was all about creating a cloud-based service that had a great catalogue of books, great selection, and great prices. And as we've rolled out devices since then, everything from Fire TV to, as you mentioned, Echo and Alexa and everything in between, it's about creating that backend service that constantly improves and adds value for customers, and isn't just a gadget but instead a full end to end service that can benefit what customers want.
The market of voice-controlled assistants is on fire at the moment. Siri, Cortana, Alexa, Google Assistant, just to name a few, are the most well-known digital assistants that are currenlty dominating the market. Their cornerstone lies in an aim to provide a seamless and hands-free experience, which empowers us and eases our daily lives. From playing music to ordering pizza, voice assistants seem to be taking the market by storm. In 2015 1.7 million voice-first devices have been shipped, and 6.5 million in 2016, that excluding the mobile-built in voice services.
Our smartphone currently represents the most expensive area to be purchased per squared centimeter (even more expensive than the square meters price of houses in Beverly Hills), and it is not hard to envision that having a bot as unique interfaces will make this area worth almost zero. None of these would be possible though without heavily investing in speech recognition research. Deep Reinforcement Learning (DFL) has been the boss in town for the past few years and it has been fed by human feedbacks. However, I personally believe that soon we will move toward a B2B (bot-to-bot) training for a very simple reason: the reward structure. Humans spend time training their bots if they are enough compensated for their effort.
Rajeev Rastogi, who heads the Machine Learning team at Amazon, explains how the global ecommerce giant employs Artificial Intelligence to improve the online shopping experience. Edited excerpts: In which areas does Amazon use AI? We are applying AI to a number of problems such as speech recognition, natural language understanding, question answering, dialog systems, product recommendations, product search, forecasting future product demand, among others. We have used Deep Learning to do better speech recognition. We use neural networks to convert speech (spoken by users) to text with very high accuracy.