Already, about one in four U.S. consumers has a home personal assistant at their beck and call, thanks to the success of smart speakers like Amazon Echo and Google Nest. But many users are just scratching the surface of what these gadgets can do. If you aren't familiar with the speakers (both starting at $35), you wake up your artificial intelligence-driven helper with a keyword – "Alexa" for Amazon devices and "OK, Google" for a Google Nest or Google Home speaker – followed by a question or command. A human-like voice will give you a response, whether you want to hear the weather, a specific song, set a timer for the oven, or control your smart devices in your home, such as adjusting lighting or a thermostat. One-fourth of U.S. consumers (25%) will use a smart speaker in 2020, up from 17% in 2018, according to research firm eMarketer.
The second generation of Google's smallest smart speaker gets a new name, more eco-friendly, a little smarter and more bass. The £49 Nest Mini replaces the Google Home Mini as part of a revamped and renamed line of Google smart home products under the Nest brand, pushing its predecessor to a clearance price of only £19. From the outside you would be hard pushed to see what has changed. The Nest Mini sticks with the same pincushion design with a fabric top and nonslip rubber pad on the bottom. The top contains three far-field microphones and is touch sensitive.
While the earlier decade was all about data communication and internet proliferation, the economy of the next few years will thrive upon digitization of systems, where we will encounter implementation of smartness into everything i.e. smart home, smart city, smart appliances, smart retail, etc. The demand for smart home appliances is increasing due to numerous advancements and adoptions of digital technologies in everyday life. IoT solutions is one of the key focus areas of digital transformation projects in the consumer electronics and home appliances industry. With the addition of smart devices and appliances, and with capabilities in sensing, connectivity, and data transmission, people can interact, collect, and analyze highly valuable data to automate various operations at home, which were previously performed manually. Today, technology has evolved to such an extent that there's a possibility to design meaningful collaboration between humans and machines, primarily due to advancements in AI.
The Fire TV Cube is Amazon's attempt to combine a smart TV streaming box with an Alexa-powered smart speaker, producing a small black box that doubles as an Echo device. The combination of shiny and matt black plastic makes it stand out at first, but the 86mm-wide and 77mm-tall cube is small enough not to be distracting sitting next to your TV. It's essentially a voice-controlled Echo Dot mated with a Fire TV smart television box. The top resembles an Echo Dot with the same four-way configuration of buttons for volume, muting the microphones and an action button, plus a series of holes for the eight beam-forming mics. A light strip at the top front edge shows what Alexa is doing, lighting up blue when listening, or orange with alerts.
Ask the silver-haired residents of the elderly care community Yinheyuan in central Beijing what they know about artificial intelligence (AI), and they will probably throw the question to the smart speakers within their reach. These smart speakers, capable of interacting with users with voice-recognition technologies, are also part of the answer. Via voice command, senior residents can control lights, TVs and other home appliances, order food or ask for help. AI is no longer a technical term used exclusively by professionals in China. Both young and old are enjoying the benefits of the growing smart economy.
Current home appliances are capable to execute a limited number of voice commands such as turning devices on or off, adjusting music volume or light conditions. Recent progress in machine reasoning gives an opportunity to develop new types of conversational user interfaces for home appliances. In this paper, we apply state-of-the-art visual reasoning model and demonstrate that it is feasible to ask a smart fridge about its contents and various properties of the food with close-to-natural conversation experience. Our visual reasoning model answers user questions about existence, count, category and freshness of each product by analyzing photos made by the image sensor inside the smart fridge. Users may chat with their fridge using off-the-shelf phone messenger while being away from home, for example, when shopping in the supermarket. We generate a visually realistic synthetic dataset to train machine learning reasoning model that achieves 95% answer accuracy on test data. We present the results of initial user tests and discuss how we modify distribution of generated questions for model training based on human-in-the-loop guidance. We open source code for the whole system including dataset generation, reasoning model and demonstration scripts.
Following the rollout of its cloudless, edge device-focused voice assistant stack, which comprises wake word, speech-to-text translation, and speech-to-intent capabilities, Picovoice announced a web console that lets you easily create and train your own voice models. Alongside the web console release, the company joined the Arm AI Ecosystem Partner Program, which gives Picovoice deeper access to ARM IP and to chip manufacturers like NXP. Specifically, Picovoice is focused on ARM Cortex-M chip designs, which are extremely low power and can integrate into all manner of IoT devices -- but are powerful enough to support its voice assistant without the need for a cloud connection. The big idea is that OEMs can use the Picovoice web console to whip up voice controls for their devices large and small, for minimal cost. Products with voice assistants on board are hot, and although the likes of smart speakers and smart displays get the bulk of the attention, some level of voice control is possible on all manner of lower-power edge devices, from coffee makers to lights.
Here's an interesting stat from the Pew Research Center: more than half of smart speaker owners in the US (54 percent) report saying "please" at least occasionally to their AI assistants, with one-in-five (19 percent) saying please frequently. Curiously, the question of AI politeness also breaks down along gender lines, with 62 percent of women reporting that they say "please" at least sometimes, versus 45 percent for men. One possible answer is that men are generally ruder to women, and this latter category now includes AI assistants coded as female. Experts have long noted that the design choices for AI bots could have misogynist effects by reinforcing gender stereotypes. "Because the speech of most voice assistants is female, it sends a signal that women are ... docile and eager-to-please helper," a report from the UN noted earlier this year.
When Amazon debuted the Amazon Echo in 2014, there were decidedly mixed reactions to the black, cylindrical Bluetooth speaker that could pick up voice commands. Few understood why the e-commerce giant had suddenly released a $199 speaker that could talk to you. Today we know that the Echo and other devices Amazon has since released are mere vessels for the real star of the show: Alexa. The voice assistant is available in 15 languages and 80 countries and boasts more than 100,000 "skills," compared to about a dozen five years ago. It can wake up your cat, serve as an interpreter, deter a burglar, help you work out, and streamline your workflow.
--We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise. In real scenarios, these distortion sources may occur simultaneously and reducing them implies combining the corresponding distortion-specific filters. As these filters interact with each other, they must be jointly optimized. We propose to model the target and residual signals after linear echo cancellation and dereverberation using a multichannel Gaussian modeling framework and to jointly represent their spectra by means of a neural network. We develop an iterative block-coordinate ascent algorithm to update all the filters. We evaluate our system on real recordings of acoustic echo, reverberation and noise acquired with a smart speaker in various situations. The proposed approach outperforms in terms of overall distortion a cascade of the individual approaches and a joint reduction approach which does not rely on a spectral model of the target and residual signals. Index T erms--Acoustic echo, reverberation, background noise, joint distortion reduction, expectation-maximization, recurrent neural network. The near-end speaker can be a few meters away from the microphones and the interactions can be subject to several distortion sources such as background noise, acoustic echo and near-end reverberation. Each of these distortion sources degrades speech quality, intelligibility and listening comfort, and must be reduced. Single-and multichannel filters have been used to reduce each of these distortion sources independently. They can be categorized into short nonlinear filters that vary quickly over time and long linear filters that are time-invariant (or slowly time-varying). Short nonlinear filters are generally used for noise reduction . They are robust to the fluctuations and nonlinearities inherent to real signals. Long linear filters can be required for dereverberation  and echo reduction .