Collaborating Authors

trigger phrase

Meet Lena -- A Simple AI No-Code ChatBot.


Power Virtual Agent (Power VA) is the newest member of Microsoft's Low-Code and Data Platform called Power Platform and allows you to build AI-backed chatbots with no code. Ready to build a bot on your own? In this introduction story, I will guide you through the process of sign up, authoring and publishing the bot on your personal website with absolutely no code. Go to Power Virtual Agents marketing page and select Start Free. Sign in using your Microsoft work or school account or sign up, if you don't have one.

Voice trigger detection from LVCSR hypothesis lattices using bidirectional lattice recurrent neural networks Machine Learning

We propose a method to reduce false voice triggers of a speech-enabled personal assistant by post-processing the hypothesis lattice of a server-side large-vocabulary continuous speech recognizer (LVCSR) via a neural network. We first discuss how an estimate of the posterior probability of the trigger phrase can be obtained from the hypothesis lattice using known techniques to perform detection, then investigate a statistical model that processes the lattice in a more explicitly data-driven, discriminative manner. We propose using a Bidirectional Lattice Recurrent Neural Network (LatticeRNN) for the task, and show that it can significantly improve detection accuracy over using the 1-best result or the posterior.

Apple details AI to help voice assistants recognize hotwords and multilingual speakers


Speech recognition is an acute area of interest for Apple, whose cross-platform Siri virtual assistant is used by over 500 million customers worldwide. This past week, the tech giant published a series of preprint research papers investigating techniques to improve voice trigger detection and speaker verification, as well as language identification for multiple speakers. In the first of the papers, a team of Apple researchers propose an AI model trained to perform both the task of automatic speech recognition and speaker recognition. As they explain in the abstract, the commands recognized by speech-based personal assistants are usually prefixed with a trigger phrase (e.g., "Hey, Siri"), and detecting this trigger phrase involves two steps. The AI first must decide whether the phonetic content in the input audio matches that of the trigger phrase (voice trigger detection), and then it must determine whether the speaker's voice matches the voice of a registered user or users (speaker verification).

Multi-task Learning for Voice Trigger Detection Machine Learning

We describe the design of a voice trigger detection system for smart speakers. In this study, we address two major challenges. The first is that the detectors are deployed in complex acoustic environments with external noise and loud playback by the device itself. Secondly, collecting training examples for a specific keyword or trigger phrase is challenging resulting in a scarcity of trigger phrase specific training data. We describe a two-stage cascaded architecture where a low-power detector is always running and listening for the trigger phrase. If a detection is made at this stage, the candidate audio segment is re-scored by larger, more complex models to verify that the segment contains the trigger phrase. In this study, we focus our attention on the architecture and design of these second-pass detectors. We start by training a general acoustic model that produces phonetic transcriptions given a large labelled training dataset. Next, we collect a much smaller dataset of examples that are challenging for the baseline system. We then use multi-task learning to train a model to simultaneously produce accurate phonetic transcriptions on the larger dataset \emph{and} discriminate between true and easily confusable examples using the smaller dataset. Our results demonstrate that the proposed model reduces errors by half compared to the baseline in a range of challenging test conditions \emph{without} requiring extra parameters.

Multi-task Learning for Speaker Verification and Voice Trigger Detection Machine Learning

Automatic speech transcription and speaker recognition are usually treated as separate tasks even though they are interdependent. In this study, we investigate training a single network to perform both tasks jointly. We train the network in a supervised multi-task learning setup, where the speech transcription branch of the network is trained to minimise a phonetic connectionist temporal classification (CTC) loss while the speaker recognition branch of the network is trained to label the input sequence with the correct label for the speaker. We present a large-scale empirical study where the model is trained using several thousand hours of labelled training data for each task. We evaluate the speech transcription branch of the network on a voice trigger detection task while the speaker recognition branch is evaluated on a speaker verification task. Results demonstrate that the network is able to encode both phonetic \emph{and} speaker information in its learnt representations while yielding accuracies at least as good as the baseline models for each task, with the same number of parameters as the independent models.

Google Home Routines: How to put them to use


Unless you've dug deep into the settings menu for Google Home, you might not know about the smart speaker's most powerful feature. It's called Routines, and it allows you to execute multiple actions with a single voice command. For example, you can have Google Assistant announce the weather, a personalized traffic report, and news updates while you get ready for work, or have it dim your smart light bulbs and play some relaxing music a few minutes before bedtime. These routines even work with the Google Assistant app on iOS and Android--no smart speaker required. You can also schedule Routines to run at specific times without voice commands, effectively turning a Google Home speaker into a high-tech alarm clock that can wake you up with music, information, and smart home automations.

Avaya adds AI voice assistant to desk phones


With the release of a voice assistant for desk phones, Avaya is the latest unified communications vendor to explore whether the artificial intelligence technology increasingly popular among consumers has value in the enterprise market.

Harman Kardon Invoke hands-on: Cortana enters the smart speaker market with a boom


Harman Kardon's Invoke speaker, debuting Thursday for $199.95, may end up following in the footsteps of notable Microsoft-powered devices like Nokia's Windows phones: lovely hardware that's slightly tripped up by Microsoft's software and services.

4 Ways Amazon Could Make the Echo More Useful

TIME - Tech

We've long been used to talking to our technology. Apple's Siri first launched six years ago, after all. But industry experts say that entirely voice-controlled gadgets, like Amazon's Echo smart speakers, are getting us more comfortable than ever with bossing around our tech. The Echo works like this: You put one in your home and connect it to your Wi-Fi network. Then, after a bit of customization, you're able to order it to do certain tasks using one of several "wake words," like "Alexa."

A Murder Case Tests Alexa's Devotion to Your Privacy


The Amazon Echo can seem like your best friend--until it betrays you. That's because this device is different from anything else in your house. Alexa, the voice assistant that powers Echo and more, is always listening, sending what you say after using a "wake" word to Amazon's servers. Of course, Echo isn't the only voice-assistant speaker on the market, but it sits in millions of homes, and Alexa is headed to devices from companies like Ford, Dish, Samsung, and Whirlpool. He specializes in intellectual property and business law.