Moore's law has driven silicon chip circuitry to the point where we are surrounded by devices equipped with microprocessors. The devices are frequently wonderful; communicating with them – not so much. Pressing buttons on smart devices or keyboards is often clumsy and never the method of choice when effective voice communication is possible. The keyword in the previous sentence is "effective". Technology has advanced to the point where we are in the early stages of being able to communicate with our devices using voice recognition.
Artificial intelligence has begun seeping its way into every tech product and service. Now, companies are changing the underlying hardware to accommodate this shift. Apple is the latest company creating a dedicated AI processing chip to speed up the AI algorithms and save battery life on its devices, according to Bloomberg. The Bloomberg report said the chip is internally known as the Apple Neural Engine and will be used to assist devices for facial and speech recognition tasks. The latest iPhone 7 runs some of its AI tasks (mostly related to photographer) using the image signal processor and the graphics processing unit integrated on its A10 Fusion chip.
Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions. With Amazon Lex, the same deep learning technologies that power Amazon Alexa are now available to any developer, enabling you to quickly and easily build sophisticated, natural language, conversational bots ("chatbots").
There are many situations when running deep learning inferences on local devices is preferable for both individuals and companies: imagine traveling with no reliable internet connection available or dealing with privacy concerns and latency issues on transferring data to cloud-based services. Edge computing provides solutions to these problems by processing and analyzing data at the edge of network. Take the "Ok Google" feature as an example -- by training "Ok Google" with a user's voice, that user's mobile phone will be activated when capturing the keywords. This kind of small-footprint keyword-spotting (KWS) inference usually happens on-device so you don't have to worry that the service providers are listening to you all the time. The cloud-based services will only be initiated after you make the commands.
To learn more about conversational AI, check out Yishay Carmiel's session Applications of neural-based models for conversational speech at the Artificial Intelligence Conference in San Francisco, Sept. 17-20, 2017. The dream of speech recognition is a system that truly understands humans speaking--in different environments, with a variety of accents and languages. For decades, people tackled this problem with no success. Pinpointing effective strategies for creating such a system seemed impossible. In the past years, however, breakthroughs in AI and deep learning have changed everything in the quest for speech recognition.