"Automatic speech recognition (ASR) is one of the fastest growing and commercially most promising applications of natural language technology. Speech is the most natural communicative medium for humans in many situations, including applications such as giving dictation; querying database or information-retrieval systems; or generally giving commands to a computer or other device, especially in environments where keyboard input is awkward or impossible (for example, because one's hands are required for other tasks)."
– from Linguistic Knowledge and Empirical Methods in Speech Recognition. By Andreas Stolcke. (1997). AI Magazine 18 (4): 25-32.
Speech-to-text applications have never been so plentiful, popular or powerful, with researchers' pursuit of ever-better automatic speech recognition (ASR) system performance bearing fruit thanks to huge advances in machine learning technologies and the increasing availability of large speech datasets. Current speech recognition systems require thousands of hours of transcribed speech to reach acceptable performance. However, a lack of transcribed audio data for the less widely spoken of the world's 7,000 languages and dialects makes it difficult to train robust speech recognition systems in this area. To help ASR development for such low-resource languages and dialects, Facebook AI researchers have open-sourced the new wav2vec 2.0 algorithm for self-supervised language learning. The paper Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations claims to "show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler." A Facebook AI tweet says the new algorithm can enable automatic speech recognition models with just 10 minutes of transcribed speech data.
Artificial intelligence has become a technological buzzword, often solely referred to AI rather than depicting the possibly infinite amount of practical applications that artificial intelligence can actually provide, or the intricacies involved from industry to industry, and region to region. To discuss some of the many applications for artificial intelligence, as well as some of the considerations to be taken into account to create more accurate and less biased machine learning systems, I had the pleasure of speaking with Nitendra Rajput, VP and Head of Mastercard's AI Garage. Nitendra Rajput is the Vice President and Head of Mastercard's AI Garage, setting up the centre to enable it to solve problems across various business verticals globally with machine learning processes, increasing efficiencies across the business as well as mitigating instances of fraud. Nitendra has over 20 years experience working in the fields artificial intelligence, machine learning, and mobile interactions, after realising a gap in the market for developing speech recognition systems for vocally-led countries, such as India. Prior to Mastercard's AI Garage, he spent 18 years at IBM Research, working on different aspects of machine learning, human-computer interaction, software engineering and mobile sensing.
Healthcare has been one of the countless beneficiaries of the revolutionary advances that widespread computing has brought. Fast, efficient data organisation, storage and access that have greatly sped up the medical enterprise, yet many low hanging fruits remain hanging. Chief among those is the increased application of technologies that can process speech. In this post, we'll share with you how speech technology can improve healthcare in the three following ways. Finally, (3) voice signal analysis can be used for earlier diagnosis and to help track the changes of medical condition over time.
It's a sci-fi trope: A universal translator that allows instantaneous communication between speakers of different languages. The TARDIS does it in Doctor Who, the babble fish serves that function in The Hitchhiker's Guide to the Galaxy, and of course the linguist Hoshi Sato had one on the Enterprise. Cheetah Mobile, a Chinese mobile technology company that's had some bumps in the U.S. market, is coming out with a new version of an existing translator device, this one promising instantaneous two-way communication in 73 languages thanks to a standalone piece of kit that's smaller than a smart phone. Powered by Microsoft's automatic speech recognition software and OrionStar AI Technology, the device is meant to provide users with instant two-way translation in 73 languages while displaying text on a 1.54" IPS-LCD touch-screen, which offers a text-to-speech function as well. Conversations can be recorded for up to two hours in HD audio, and noise cancelling microphones enable accurate translation even in noisy environments.
As part of the slew of news from its Ignite conference today, Microsoft announced some changes coming to its Outlook app for iOS and Android. Voice controls are getting expanded so you can dictate short emails or call a contact from the app, in addition to being able to speak search terms as you could do before. You'll also see suggested replies in emails where Microsoft detects a request for a meeting. In the latter, Outlook will offer options for you to send your availability or schedule a meeting to create a new event on your calendar. Those on Android will also get new actionable notifications so you can reply, archive or delete an incoming message when an alert comes in.
"It was the best of times, it was the worst of times." This quote from A Tale of Two Cities, by Charles Dickens, describes some of what we're experiencing during the COVID-19 pandemic. Recent months have revealed many of the best aspects of local communities and highlighted small acts of kindness such as checking on neighbors to finding new avenues for social interaction. However, recent events have also exposed a pre-existing bias towards older adults. The perspective that people 50 years' old and older are all just one monolithic group is a mistake for business, for product design, and for successful voice experiences.
Bengaluru-based food delivery major Swiggy is looking to incorporate artificial intelligence (AI) -driven speech recognition models for its call centre process. The company has partnered with BPO and social enterprise IndiVillage to power the platform's broader AI and machine learning (ML) charter. This engagement will also include voice annotation work that provides training data for Swiggy's ML algorithms. Swiggy, in its press statement, explained that there is a need to efficiently extract information from the call data when a call centre service executives move from one call to another. This will help the executive understand the'voice of the customers' to enable a deeper understanding of the issues faced by customers and accordingly solve for the same.
Technology is invading in every sector. New inventions, innovation and devices are making life easier for everyone. Voice recognition technology is one such amazing initiative to look for in the growing innovation era. Voice recognition also known as speech recognition, is a computer software program or a hardware device with the ability to receive, interpreting and understanding voice and carry out commands. The technology unravels the feature to easily create and control documents by speaking, with the help of technology.
Over 111.8 million people in the U.S. talk to voice assistants like Siri, Alexa, and Google Assistant every month, eMarketer estimates. Tens of millions of those people use assistants as data-finding tools, with the Global Web Index reporting that 25% of adults regularly perform voice searches on smartphones. But while voice assistants can answer questions about pop culture and world events like a pro, preliminary evidence suggests they struggle to supply information about elections. In a test of popular assistants' abilities to provide accurate, localized context concerning the upcoming U.S. presidential election, VentureBeat asked Alexa, Siri, and Google Assistant a set of standardized questions about procedures, deadlines, and misconceptions about voting. In general, the assistants fared relatively poorly, often answering questions with information about voting in other states or punting questions to the web instead of answering them directly. In light of historic misinformation efforts around the election, the shortcomings have the potential to sow confusion or hamper get-out-the-vote efforts -- especially among those with accessibility challenges who rely heavily on voice assistants.
In 2017, we launched Amazon Transcribe, an automatic speech recognition service that makes it easy for developers to add a speech-to-text capability to their applications. Since then, we added support for more languages, enabling customers globally to transcribe audio recordings in 31 languages, including 6 in real-time. A popular use case for Amazon Transcribe is transcribing customer calls. This allows companies to analyze the transcribed text using natural language processing techniques to detect sentiment or to identify the most common call causes. If you operate in a country with multiple official languages or across multiple regions, your audio files can contain different languages.