Collaborating Authors

Speech Recognition

Amazon, Apple, Microsoft, Meta and Google to improve speech recognition for people with disabilities


The University of Illinois (UIUC) has partnered with Amazon, Apple, Google, Meta, Microsoft and nonprofits on the Speech Accessibility Project. The aim is to improve voice recognition for communities with disabilities and diverse speech patterns often not considered by AI algorithms. That includes people with Lou Gehrig's disease (ALS), Parkinson's, cerebral palsy, Down syndrome and other diseases that affect speech. "Speech interfaces should be available to everybody, and that includes people with disabilities," UIUC professor Mark Hasegawa-Johnson said. "This task has been difficult because it requires a lot of infrastructure, ideally the kind that can be supported by leading technology companies, so we've created a uniquely interdisciplinary team with expertise in linguistics, speech, AI, security and privacy."

How will OpenAI's Whisper model impact AI applications?


Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Last week, OpenAI released Whisper, an open-source deep learning model for speech recognition. Developers and researchers who have experimented with Whisper are also impressed with what the model can do. However, what is perhaps equally important is what Whisper's release tells us about the shifting culture in artificial intelligence (AI) research and the kind of applications we can expect in the future.

Enrich Your Web Application With a Free A.I. Voice Recognition


But it is not user-friendly. At this point, we need to trigger the recording from the UI, and not from the console. Similarly, we need to process the speech predictions and display them on the page or use them somehow. To finalize our concise demo, we'll simply add a button to allow the user to start the recording and we'll display the predictions on the page, as a list.

OpenAI can hear you Whisper


Speech recognition remains a challenge in artificial intelligence, but OpenAI's latest move takes us one step closer to solving it. The software is an automatic speech recognition (ASR) system trained on 680.000 hours of multilingual and multitask supervised data from the web. Other organizations like Google, Meta and Amazon have all tried to design ASR-systems that lie at the core of many products. OpenAI now could outperform every one of those ASR-systems. What makes this new software different is the robustness against background noises, accents and technical terminology.

AI is already better at lip reading that we are


They Shall Not Grow Old, a 2018 documentary about the lives and aspirations of British and New Zealand soldiers living through World War I from acclaimed Lord of the Rings director Peter Jackson, had its hundred-plus-year-old silent footage modernized through both colorization and the recording of new audio for previously non-existent dialog. To get an idea of what the folks featured in the archival footage were saying, Jackson hired a team of forensic lip readers to guesstimate their recorded utterances. Reportedly, "the lip readers were so precise they were even able to determine the dialect and accent of the people speaking." "These blokes did not live in a black and white, silent world, and this film is not about the war; it's about the soldier's experience fighting the war," Jackson told the Daily Sentinel in 2018. "I wanted the audience to see, as close as possible, what the soldiers saw, and how they saw it, and heard it."

The Evolving Landscape of Automatic Speech Recognition


Automatic speech recognition (ASR) has come a long way. Though it was invented long ago, it was hardly ever used by anyone. However, time and technology have now changed significantly. Audio transcription has substantially evolved. Technologies such as AI (Artificial Intelligence) have powered the process of audio-to-text translation for quick and accurate results.

Voice assistants could 'hinder children's social and cognitive development'

The Guardian

From reminding potty-training toddlers to go to the loo to telling bedtime stories and being used as a "conversation partner", voice-activated smart devices are being used to help rear children almost from the day they are born. But the rapid rise in voice assistants, including Google Home, Amazon Alexa and Apple's Siri could, new research suggests, have a long-term impact on children's social and cognitive development, specifically their empathy, compassion and critical thinking skills. "The multiple impacts on children include inappropriate responses, impeding social development and hindering learning opportunities," said Anmol Arora, co-author of research published in the journal Archives of Disease in Childhood. A key concern is that children attribute human characteristics and behaviour to devices that are, said Arora, "essentially a list of trained words and sounds mashed together to make a sentence." The children anthropomorphise and then emulate the devices, copying their failure to alter their tone, volume, emphasis or intonation.

Why Speech Separation is Such a Difficult Problem to Solve


You are talking on the phone, or recording an audio, or just speaking to voice assistants like Google Assistant, Cortana, or Alexa. But the person on the other side of the call cannot hear you because you are in a crowded place, the recorded audio has a lot of background noise, or the "Hey, Alexa" call wasn't picked up by your device because someone else started speaking. All of these problems related to separating voices, informally referred to as the "cocktail party problem", have been addressed using artificial intelligence and deep learning methods in recent years. But still, separating and inferring multiple simultaneous voices is a difficult problem to completely solve. To start, speech separation is extracting speech of the "wanted speaker" or "speaker of interest" from the overlapping mixture of speech from other speakers, also referred to as'noise'.

La veille de la cybersécurité


The last few years have seen increased adoption of voice technology, with the usage of voice assistants booming across the globe. A lot of it has to do with advancements in speech recognition technology, easy accessibility to voice interfaces and availability at the right time and the right place. Not only that, but Covid-19 has acted as a catalyst for businesses. Popularly referred to as the "fourth channel of sales," voice technology is impacting how consumers interact with brands, preferring the immediacy and interpersonality of phone calls. Voice assistants are not only helping people get through their regular routines; they have become essential for businesses hoping to assist customers, improve employee engagement, enhance communication efficiencies and elevate user experiences.