If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
Today concludes Amazon's re:Invent 2019 conference in Las Vegas, where the Seattle company's Amazon Web Services (AWS) division unveiled enhancements heading down its public cloud pipeline. Just Tuesday, Amazon announced the general availability of AWS Outposts, a fully managed service that extends AWS' infrastructure and services to customer datacenters, co-location spaces, and on-premises facilities. And it debuted in preview Amazon Detective, which helps to analyze, investigate, and identify the root cause of potential security issues and suspicious activities. That's not to mention AI-powered fraud detection and code review products and an expanded machine learning experimentation and development studio, as well as a dedicated instance for AI inferencing workloads. But perhaps the most intriguing launch this week was that of Amazon Transcribe Medical, a service that's designed to transcribe medical speech for clinical staff in primary care settings.
This week, for example, Nvidia (NASDAQ: NVDA) added a secure learning tool to its Clara AI platform aimed at medical imaging. Amazon followed with a speech recognition system that allows developers to add transcription services to medical apps. Along with eligibility under HIPAA, the Health Insurance Portability and Accountability Act, the service known as Amazon Transcribe Medical also responds to an electronic health records law called the HITECH Act. The 2009 law requires physicians to provide detailed data entries in patient records. That requirement has added to clinicians' workloads, often reducing the amount of time spent with patients.
On 2 December 2019, Amazon expanded its automatic transcription service for AWS to include support for medical speeches. Transcriptions can be of many types -- for movies and entertainment content, transcribing audio for the hearing impaired, audio for voice-over, etc -- but one of the most essential application is in the field of medical practices. If you have a doctor in the family, you would've seen them spend a significant amount of time talking into a recorder about the medical conditions to document them later. And by later, we mean that there is a medical transcriber who the next day or so and'transcribes' the recording into a format that can be documented and archived. Although you can say that the transcriber removes the unnecessary'eh' and'uhs' from the recording, a transcriber does more than that.
Speech recognition is the task of detecting spoken words. There are many techniques to do Speech Recognition. In this post, we will go through some background required for Speech Recognition and use a basic technique to build a speech recognition model. The code is available on GitHub. For the techniques mentioned in this post, check this Jupyter Notebook. Let's take a step back and understand what audio actually is. We all listen to music on our computers/phones.
Two of the issues that vex both developers and users are voice recognition accuracy (do we get it right) and response time (do we get it done fast). The voice assistant built into Google's Pixel 4 phone Sonos' acquisition of Snips, which makes an AI voice processing platform Both signal a shift in approach to accuracy and responsiveness that the industry can take to make voice control an even more compelling user interface. What these developments have in common is the addition of much more sophisticated onboard voice recognition processing to the local device. Most devices with speech recognition today pass a voice spectrogram to the cloud to be processed; moving this processing out of the cloud takes the bottleneck out of response time and enables the development of voice control interfaces for specific applications that will be much more accurate than the experience today. For Sonos and Snips, this allows simple voice commands like play, pause and stop to be completely processed locally.
But even state-of-the-art systems struggle to overcome ambiguities in lip movements, preventing their performance from surpassing that of audio-based speech recognition. In pursuit of a more performant system, researchers at Alibaba, Zhejiang University, and the Stevens Institute of Technology devised a method dubbed Lip by Speech (LIBS), which uses features extracted from speech recognizers to serve as complementary clues. They say it manages industry-leading accuracy on two benchmarks, besting the baseline by a margin of 7.66% and 2.75% in character error rate. LIBS and other solutions like it could help those hard of hearing to follow videos that lack subtitles. It's estimated that 466 million people in the world suffer from disabling hearing loss, or about 5% of the world's population.
Following the rollout of its cloudless, edge device-focused voice assistant stack, which comprises wake word, speech-to-text translation, and speech-to-intent capabilities, Picovoice announced a web console that lets you easily create and train your own voice models. Alongside the web console release, the company joined the Arm AI Ecosystem Partner Program, which gives Picovoice deeper access to ARM IP and to chip manufacturers like NXP. Specifically, Picovoice is focused on ARM Cortex-M chip designs, which are extremely low power and can integrate into all manner of IoT devices -- but are powerful enough to support its voice assistant without the need for a cloud connection. The big idea is that OEMs can use the Picovoice web console to whip up voice controls for their devices large and small, for minimal cost. Products with voice assistants on board are hot, and although the likes of smart speakers and smart displays get the bulk of the attention, some level of voice control is possible on all manner of lower-power edge devices, from coffee makers to lights.
In developing the ground-breaking Seeing AI app, Saqib Shaikh and the team at Microsoft were driven by this simple but powerful re-framing he articulated, "What if we could look at disability as an engine of innovation?" "There's so many examples where the technologies we rely on today where inspired or influenced by disability, from speech recognition and text to speech to the touch screen itself. There's this terminology of inclusive design where if you focus in on one person's needs, then actually doing that can help you create solutions which benefit a broader population. With seeing AI, we focus in on the needs of people who are blind or low vision, but in doing that I believe it also helps us make better products for all customers," said Shaikh. We spoke about the evolution of platform and how the team approached adding new features. "We're always listening to our customers (the low or no vision community) and understanding what are the challenges that they face. And then we're talking to the scientists and engineers at Microsoft to see what are the emerging technologies we can leverage. And with each of these, we consider the type of task you can complete," said Shaikh.
There are billions of text data being generated every day. All these channels are constantly generating large amount of text data every second. And because of the large volumes of text data as well as the highly unstructured data source, we can no longer use the common approach to understand the text and this is where NLP comes in. With the big data technology, NLP has entered the mainstream as this approach can now be applied to handle large volumes of text data via cloud/distributed computing at an unprecedented speed. Imagine you're given a sentence and tasked to identify if this sentence has positive/negative/neutral sentiment, manually.
As part of re:Invent, today AWS announced Amazon Transcribe Medical, a new HIPAA-eligible, machine learning automatic speech recognition (ASR) service that allows developers to add medical speech-to-text capabilities to their applications. Transcribe Medical provides accurate and affordable medical transcription, enabling healthcare providers, IT vendors, insurers, and pharmaceutical companies to build services that help physicians, nurses, researchers, and claims agents improve the efficiency of medical note-taking. Today, clinicians can spend up to an average of six additional hours per day, on top of existing medical tasks, just writing notes for electronic health record (EHR) data entry. Not only is the processing time consuming and exhausting for physicians, but it is also a leading factor of workplace burnout and stress that distracts physicians from engaging patients attentively, resulting in poorer patient care and rushed visits. While medical scribes have been employed to assist with manual note-taking, the solution is expensive, difficult to scale across thousands of medical facilities, and some patients find the presence of a scribe uncomfortable, leading to less candid discussions about symptoms.