Speech encompasses speech understanding/recognition and speech synthesis.
A specific use case worth exploring in this regard is MT for User Generated Content (UGC). Because of the speed with which UGC (comments, feedback, reviews) is being created and the corresponding costs of its professional translation, many organizations turn to MT. Popular examples of such companies are Skype (in addition to text translation, Microsoft developed the Automatic Speech Recognition (ASR) for audio speech translation in Skype) and Facebook. The social network is aiming to solve the challenge of fine-tuning each system relating to a specific language pair, using neural machine translation (NMT) and benefiting from various contexts for translations. One solution that tackles this issue is the technology developed by Language I/O. It takes into account the client's glossaries and TMs, selects the best MT engine output and then improves on the results using cultural intelligence and/or human linguists who compare machine translations post-facto to ensure that their MT Optimizer engine learns over time.
The world is only just getting used to the power and sophistication of virtual assistants made by companies like Amazon and Google, which can decode our spoken speech with eerie precision compared to what the technology was capable of only a few short years ago. In truth, however, a far more impressive and mind-boggling milestone may be just around the corner, making speech recognition seem almost like child's play: artificial intelligence (AI) systems that can translate our brain activity into fully formed text, without hearing a single word uttered. Brain-machine interfaces have evolved in leaps and bounds over recent decades, proceeding from animal models to human participants, and are, in fact, already attempting this very kind of thing. Just not with much accuracy yet, researchers from the University of California San Francisco explain in a new study. To see if they could improve upon that, a team led by neurosurgeon Edward Chang of UCSF's Chang Lab used a new method to decode the electrocorticogram: the record of electrical impulses that occur during cortical activity, picked up by electrodes implanted in the brain.
Today, Yusuf Mehdi, Corporate Vice President of Modern Life and Devices, announced the availability of new Microsoft 365 Personal and Family subscriptions. In his blog, he shared a few examples of how Microsoft 365 is innovating to deliver experiences powered by artificial intelligence (AI) to billions of users every day. Whether through familiar products like Outlook and PowerPoint, or through new offerings such as Presenter Coach and Microsoft Editor across Word, Outlook, and the web, Microsoft 365 relies on Azure AI to offer new capabilities that make their users even more productive. Azure AI is a set of AI services built on Microsoft's breakthrough innovation from decades of world-class research in vision, speech, language processing, and custom machine learning. What is particularly exciting is that Azure AI provides our customers with access to the same proven AI capabilities that power Microsoft 365, Xbox, HoloLens, and Bing.
Speech-to-text (STT), also known as automated-speech-recognition (ASR), has a long history and has made amazing progress over the past decade. Currently, it is often believed that only large corporations like Google, Facebook, or Baidu (or local state-backed monopolies for the Russian language) can provide deployable "in-the-wild" solutions. Following the success and the democratization (the so-called "ImageNet moment", i.e. the reduction of hardware requirements, time-to-market and minimal dataset sizes to produce deployable products) of computer vision, it is logical to hope that other branches of Machine Learning (ML) will follow suit. The only questions are, when will it happen and what are the necessary conditions for it to happen? If the above conditions are satisfied, one can develop new useful applications with reasonable costs. Also democratization occurs - one no longer has to rely on giant companies such as Google as the only source of truth in the industry.
When you hear the words artificial intelligence, what's the first thing that comes to mind? Driverless cars, Amazon shopping, Netflix movie recommendations and trading software to help bankers. Many think of artificial intelligence in healthcare as a buzz word or just a concept that will fully develop in the near future, but has no impact in your life right now. Some other household examples of current-day technology that use AI include Siri, Alexa, Google Now – these popular speech recognition software assistants all use artificial intelligence! Recently, Alexa was cleared to handle patient information.
Like the lipreading spies of yesteryear peering through their binoculars, almost all visual speech recognition VSR research these days focuses on mouth and lip motion. But a new study suggests that VSR models could perform even better if they used additional available visual information. The VSR field typically looks at the mouth region since it is believed that lip shape and motion contain almost all the information correlated with speech. This has made the information in other facial regions considered as weak by default. But a new paper from the Key Laboratory of Intelligent Information Processing of the Chinese Academy of Sciences and the University of Chinese Academy of Sciences proposes that information from extraoral facial regions can consistently benefit SOTA VSR model performance.
New evidence of voice recognition's racial bias problem has emerged. Speech recognition technologies developed by Amazon, Google, Apple, Microsoft, and IBM make almost twice as many errors when transcribing African American voices as they do with white American voices, according to a new Stanford study. All five systems produced these error rates even when the speakers were of the same gender and age, and saying the exact same words. We can't know for sure if these technologies are used in virtual assistants, such as Siri and Alexa, as none of the companies disclose this information. If they are, the products will be offering a vastly inferior service to a huge chunk of their users -- which can have a major impact on their daily lives.
Speech recognition systems are deep-rooted with bias toward people of color, a new study reveals. Stanford researchers found these technologies from Amazon, Apple, Google, IBM and Microsoft make twice as many errors when interpreting language from black people than words spoken by whites. The team fed systems with nearly 2,000 speech samples from 115 individuals, 42 whites and 73 blacks, and found the average error rate for whites was 19 percent and 35 percent for blacks. Apple was found to perform the worst out of the group with a 45 percent error rate for black speakers and 23 percent for white speakers. Those involved with the study believed the inaccuracies are due to data sets used to train the systems are designed predominately by white people.
The study tested five publicly available tools from Apple, Amazon, Google, IBM and Microsoft that anyone can use to build speech recognition services. These tools are not necessarily what Apple uses to build Siri or Amazon uses to build Alexa. But they may share underlying technology and practices with services like Siri and Alexa. Each tool was tested last year, in late May and early June, and they may operate differently now. The study also points out that when the tools were tested, Apple's tool was set up differently from the others and required some additional engineering before it could be tested.
Tech companies are known to listen in on private conversation via its smart speakers in order to'improve voice-recognition features.' Now that millions of people are currently working home due to the coronavirus outbreak, employers are urging their stuff to power down the technology in order to keep it from listening to confidential phone calls. Mishcon de Reya LLP, the UK law firm that advised Princess Diana on her divorce, advised staff to mute or shut off listening devices like Amazon's Alexa or Google's voice assistant when they talk about client matters at home, according to a partner at the firm. Video products such as Ring and baby monitors are also on the list of devices to be away of while working from home, as first reported on by Bloomberg. Mishcon de Reya LLP, the UK law firm that advised Princess Diana on her divorce, advised staff to mute or shut off listening devices like Amazon's Alexa or Google's voice assistant when they talk about client matters at home Mishcon de Reya partner Joe Hancock, who also heads the firm's cybersecurity efforts, told Bloombger: 'Perhaps we're being slightly paranoid but we need to have a lot of trust in these organizations and these devices.' 'We'd rather not take those risks.'