"Automatic speech recognition (ASR) is one of the fastest growing and commercially most promising applications of natural language technology. Speech is the most natural communicative medium for humans in many situations, including applications such as giving dictation; querying database or information-retrieval systems; or generally giving commands to a computer or other device, especially in environments where keyboard input is awkward or impossible (for example, because one's hands are required for other tasks)."
– from Linguistic Knowledge and Empirical Methods in Speech Recognition. By Andreas Stolcke. (1997). AI Magazine 18 (4): 25-32.
Voice assistants are becoming an essential part of our daily lives. When Apple's Siri hit markets in 2011, it managed to gain an impressive attraction of tech enthusiasts, yet no one was certain about how this novelty shall bring a tech revolution. Today, we are regular users of Google Voice Assistant, Amazon Alexa, and many more. Things took a turn when Google Home, Amazon Echo, and Apple HomePod went mainstream in 2017. All these instances converge on how voice assistants are proving themselves as a tech enabler with impressive possibilities. Not only in households, but they are also slowly proving to be useful in the business quarters too.
In today's world, it is nearly impossible to avoid voice-controlled digital assistants. From the interactive intelligent agents used by corporations, government agencies, and even personal devices, automated speech recognition (ASR) systems, combined with machine learning (ML) technology, increasingly are being used as an input modality that allows humans to interact with machines, ostensibly via the most common and simplest way possible: by speaking in a natural, conversational voice. Yet as a study published in May 2020 by researchers from Stanford University indicated, the accuracy level of ASR systems from Google, Facebook, Microsoft, and others vary widely depending on the speaker's race. While this study only focused on the differing accuracy levels for a small sample of African American and white speakers, it points to a larger concern about ASR accuracy and phonological awareness, including the ability to discern and understand accents, tonalities, rhythmic variations, and speech patterns that may differ from the voices used to initially train voice-activated chatbots, virtual assistants, and other voice-enabled systems. The Stanford study, which was published in the journal Proceedings of the National Academy of Sciences, measured the error rates of ASR technology from Amazon, Apple, Google, IBM, and Microsoft, by comparing the system's performance in understanding identical phrases (taken from pre-recorded interviews across two datasets) spoken by 73 black and 42 white speakers, then comparing the average word error rate (WER) for black and white speakers.
Uday Kamath has more than 20 years of experience architecting and building analytics-based commercial solutions. He currently works as the Chief Analytics Officer at Digital Reasoning, one of the leading companies in AI for NLP and Speech Recognition, heading the Applied Machine Learning research group. Most recently, Uday served as the Chief Data Scientist at BAE Systems Applied Intelligence, building machine learning products and solutions for the financial industry, focused on fraud, compliance, and cybersecurity. Uday has previously authored many books on machine learning such as Machine Learning: End-to-End guide for Java developers: Data Analysis, Machine Learning, and Neural Networks simplified and Mastering Java Machine Learning: A Java developer's guide to implementing machine learning and big data architectures. Uday has published many academic papers in different machine learning journals and conferences.
The vision for this future is to unlock the human voice as a meaningful measurement of health. AI voice assistants can transform speech into a vital sign, enabling early detection and predictions of oncoming conditions. Similar to how temperature is an indicator of fever, vocal biomarkers can provide us with a more complete picture of our health. One in four people globally will be affected by major or minor mental health issues at some point in their lives. Around 450 million people currently suffer from conditions such as anxiety, stress, depression, or others, placing mental health among the leading cause of ill-health worldwide.
Following a conversation and transcribing it precisely is one of the biggest challenges in artificial intelligence (AI) research. For the first time now, researchers of Karlsruhe Institute of Technology (KIT) have succeeded in developing a computer system that outperforms humans in recognizing such spontaneously spoken language with minimum latency. This is reported on arXiv.org. "When people talk to each other, there are stops, stutterings, hesitations, such as'er' or'hmmm,' laughs and coughs," says Alex Waibel, Professor for Informatics at KIT. "Often, words are pronounced unclearly." This makes it difficult even for people to make accurate notes of a conversation.
With the advent of new deep learning approaches based on transformer architecture, natural language processing (NLP) techniques have undergone a revolution in performance and capabilities. Cutting-edge NLP models are becoming the core of modern search engines, voice assistants, chatbots, and more. Modern NLP models can synthesize human-like text and answer questions posed in natural language. As DeepMind research scientist Sebastian Ruder says, NLP's ImageNet moment has arrived. While NLP use has grown in mainstream use cases, it still is not widely adopted in healthcare, clinical applications, and scientific research.
Voice assistants such as Alexa and Siri will become common in children's bedrooms, according to a new report from Internet Matters, the online safety body, which says it is critical for parents to spend more time understanding new technology. The pandemic has accelerated the adoption of new technology at home by "three or four years", the researchers said, and families in the UK will become much more reliant on voice-enabled devices over the next five years. The report's author, Lynne Hall, professor of computer science at the University of Sunderland, said we would even see the emergence of a range of celebrity voice assistants. "You'd have Elsa from Frozen," Hall said. "You can imagine that with every Disney film that came out there would be a new voice skin."
"What's that song that goes laaaa, laaa, la la la la laaa?" If you've ever found yourself asking such a question -- and I think that's most of us, at some point or the other -- Google has a new feature to help. Starting today, iOS and Android users can find a song by simply humming the relevant tune in the Google app or Search widget. Just tap on the microphone icon, say "what's this song?" and start humming for 10-15 seconds. No lyrics, artist name, or anything else required -- just not being totally tone-deaf, presumably.
We live in an exciting time, especially in innovation, progress, technology, and commerce. Some of the latest tech inventions, such as artificial intelligence and machine learning, are making a tremendous impact on every industry. Voice assistants powered by AI have already transformed the eCommerce trend. The eCommerce giants such as Amazon continue to fuel this trend as they compete for the market share. Voice interfacing is advancing at an exceptional rate in healthcare and banking industries to keep pace with consumers' demands.