"Automatic speech recognition (ASR) is one of the fastest growing and commercially most promising applications of natural language technology. Speech is the most natural communicative medium for humans in many situations, including applications such as giving dictation; querying database or information-retrieval systems; or generally giving commands to a computer or other device, especially in environments where keyboard input is awkward or impossible (for example, because one's hands are required for other tasks)."
– from Linguistic Knowledge and Empirical Methods in Speech Recognition. By Andreas Stolcke. (1997). AI Magazine 18 (4): 25-32.
If you want to control a Roku player or Roku smart TV by voice, you have lots of options. Many Roku devices include a remote control that supports voice commands, and you can also control Roku hands-free with an Amazon Echo or Google Home smart speaker. But along with all those voice control options comes several limitations, especially when it comes to launching videos or TV channels directly. Knowing what Roku can and can't do will spare you some headaches when you're barking out orders. We'll talk through how to get set up with Roku voice controls, a list of supported voice commands, and some tips for making your experience smoother.
If you're one of the few people who own a Google Pixel phone, you'll soon be able to experience voice recognition without the internet. Google has announced the rollout of "an end-to-end, all-neural, on-device speech recognizer to power speech input in Gboard", the company's keyboard with Google Search baked in. The technology could give Google an edge over Siri and Alexa in convincing people to talk to machines through phones and home speakers that can deliver answers faster, by cutting down the latency that comes with sending a request from a device to a remote server and waiting for a response. The company has enabled on-device voice recognition by miniaturizing a machine-learning model that can do the task on a phone rather than handing off the job to a server in the cloud. Google researchers detailed the on-device technique in a paper published on arXiv.org in November called'Streaming End-to-end Speech Recognition For Mobile Devices'.
Google has updated its Gboard keyboard app for Android with AI-powered dictation that works offline. The company says it's effectively miniaturized a cloud-based neural network system for speech recognition into an 80MB mobile app update, and that it'll allow for faster and more reliable dictation on the go. That's big, because it means you don't need your phone connecting to a server to deliver high-quality speech recognition results – and you also don't need to have access to a high-speed Wi-Fi network to use the feature. Google, Reddit, and Slack will be there. The new system has been in the works since 2014, and it eschews the traditional three-step process for speech recognition for a single-step solution.
You can now dictate your texts with Google's Gboard keyboard even when you're offline, at least if you use a Pixel. Google's AI team announced that it updated the Gboard's speech recognizer to recognize characters one-by-one as they're spoken, and it is now hosted directly on the device. By no longer having to send data over the internet, Gboard's voice typing should now be faster and more reliable. Google explained in a blog post that it wanted to create a speech recognizer that was "compact enough to reside on a phone" and wouldn't be derailed by unreliable WiFi or mobile networks. Voice recognition traditionally works by breaking apart the words you speak into smaller parts known as phonemes, according to Science Line.
Voice recognition is a standard part of the smartphone package these days, and a corresponding part is the delay while you wait for Siri, Alexa or Google to return your query, either correctly interpreted or horribly mangled. Google's latest speech recognition works entirely offline, eliminating that delay altogether -- though of course mangling is still an option. The delay occurs because your voice, or some data derived from it anyway, has to travel from your phone to the servers of whoever operates the service, where it is analyzed and sent back a short time later. This can take anywhere from a handful of milliseconds to multiple entire seconds (what a nightmare!), or longer if your packets get lost in the ether. Why not just do the voice recognition on the device?
On-device machine learning algorithms afford plenty of advantages, namely low latency and availability -- because processing is performed locally as opposed to remotely on a server, connectivity has no bearing on performance. Google sees the wisdom in this: It today announced that Gboard, its cross-platform virtual keyboard app, now uses an end-to-end recognizer to power American English speech input on Pixel smartphones. "This means no more network latency or spottiness -- the new recognizer is always available, even when you are offline," Johan Schalkwyk, a fellow on Google's Speech Team, wrote in a blog post. "The model works at the character level, so that as you speak, it outputs words character-by-character, just as if someone was typing out what you say in real-time, and exactly as you'd expect from a keyboard dictation system." It's more complicated than it sounds.
Students of the Brihanmumbai municipal school in Andheri's DN Nagar are now privy to that savoury secret after the Google boss' visit here last week. And when a student asked Pichai what it takes to be an engineer, the boy next doorturned-Silicon Valley pin-up said, "Do you have a radio and TV at home? When it gets old, just learn to break that apart." It was the perfect photo-op. But Pichai used this opportunity to see how Bolo -- a reader app powered by Google AI for text-to-speech and speech recognition -- works on the ground.
It wasn't that long ago that talking to computers was the preserve of movies and science fiction. Slowly, voice recognition improved, and these days it's getting to be pretty usable. The technology has moved beyond basic keywords, and can now parse sentences in natural language. The device is built around Google's AIY Voice Kit, which consists of a Raspberry Pi with some additional hardware and software to enable it to process voice queries. This allows WhatIsThat to respond to users asking questions by taking a photo, and then identifying what it sees in the frame.
Google, which already dominates India's smartphone, search, and online video market, today launched a learning app for primary school children in the country as part of an effort to cement its grip on the world's fastest-growing internet market. The Android app, called Bolo, aims to help young kids improve their reading comprehension and vocabulary skills in Hindi and English. Bolo (the Hindi word for "speak"), features a range of games and tasks, and it rewards kids as they progress. Bolo, which is powered by Google's speech recognition and text-to-speech technology, first asks kids to read sentences. The app then listens to the efforts and reviews them, and an animated voice assistant -- called Diya -- suggests pronunciation and vocabulary corrections wherever applicable.
Speech recognition is pretty darn good these days. State-of-the-art models like EdgeSpeechNet, which was detailed in a research paper late last year, are capable of achieving about 97 percent accuracy. But even the best systems sometimes stumble on uncommon and rare words. To narrow the gap, scientists at Google and the University of California propose an approach that taps a spelling correction model trained on text-only data. In a paper published on the preprint server Arxiv.org