Collaborating Authors

Here's how Alexa learned to speak Spanish without your help


The first tool studies a handful of "golden utterances" (that is, reference commands suggested by the developers) to learn general syntax and semantics patterns. After that, it produces "rewrite expressions" that themselves create thousands of new yet similar sentences to work from. The system works quickly -- you could move from 50 utterances to a fully operational linguistic set in less than two days. Amazon's other tool uses guided resampling to replace terms that can be safely swapped, further improving the AI's training. The technique draws both on data from existing Alexa languages as well as media sources like the Amazon Music catalog, and it's capable enough to be aware of context (it won't swap a musician's name for an audiobook, for example).

Improving Multi-turn Dialogue Modelling with Utterance ReWriter Artificial Intelligence

Recent research has made impressive progress in single-turn dialogue modelling. In the multi-turn setting, however, current models are still far from satisfactory. One major challenge is the frequently occurred coreference and information omission in our daily conversation, making it hard for machines to understand the real intention. In this paper, we propose rewriting the human utterance as a pre-process to help multi-turn dialgoue modelling. Each utterance is first rewritten to recover all coreferred and omitted information. The next processing steps are then performed based on the rewritten utterance. To properly train the utterance rewriter, we collect a new dataset with human annotations and introduce a Transformer-based utterance rewriting architecture using the pointer network. We show the proposed architecture achieves remarkably good performance on the utterance rewriting task. The trained utterance rewriter can be easily integrated into online chatbots and brings general improvement over different domains.

Ranking Enhanced Dialogue Generation Artificial Intelligence

How to effectively utilize the dialogue history is a crucial problem in multi-turn dialogue generation. Previous works usually employ various neural network architectures (e.g., recurrent neural networks, attention mechanisms, and hierarchical structures) to model the history. However, a recent empirical study by Sankar et al. has shown that these architectures lack the ability of understanding and modeling the dynamics of the dialogue history. For example, the widely used architectures are insensitive to perturbations of the dialogue history, such as words shuffling, utterances missing, and utterances reordering. To tackle this problem, we propose a Ranking Enhanced Dialogue generation framework in this paper. Despite the traditional representation encoder and response generation modules, an additional ranking module is introduced to model the ranking relation between the former utterance and consecutive utterances. Specifically, the former utterance and consecutive utterances are treated as query and corresponding documents, and both local and global ranking losses are designed in the learning process. In this way, the dynamics in the dialogue history can be explicitly captured. To evaluate our proposed models, we conduct extensive experiments on three public datasets, i.e., bAbI, PersonaChat, and JDC. Experimental results show that our models produce better responses in terms of both quantitative measures and human judgments, as compared with the state-of-the-art dialogue generation models. Furthermore, we give some detailed experimental analysis to show where and how the improvements come from.

Paraphrase Augmented Task-Oriented Dialog Generation Artificial Intelligence

Neural generative models have achieved promising performance on dialog generation tasks if given a huge data set. However, the lack of high-quality dialog data and the expensive data annotation process greatly limit their application in real-world settings. We propose a paraphrase augmented response generation (PARG) framework that jointly trains a paraphrase model and a response generation model to improve the dialog generation performance. We also design a method to automatically construct paraphrase training data set based on dialog state and dialog act labels. PARG is applicable to various dialog generation models, such as TSCP (Lei et al., 2018) and DAMD (Zhang et al., 2019). Experimental results show that the proposed framework improves these state-of-the-art dialog models further on CamRest676 and MultiWOZ. PARG also significantly outperforms other data augmentation methods in dialog generation tasks, especially under low resource settings.

Amazon's AI rewrites 'millions' of Alexa user commands to reduce defects by 30%


The AI underlying assistants like Alexa gets better in part through manual data transcription and annotation, which takes outsized time and effort. In pursuit of a more scalable approach, scientists at Amazon -- noting that people tend to reformulate misinterpreted commands -- leveraged feedback from interactions to glean insights. In a paper detailing their work, they say that the automated self-learning system they deployed reduced errors across "millions" of Alexa customers. It's yet another step for Amazon along the way to a largely unsupervised and more human-like Alexa, as scientists and product managers from the company told VentureBeat in September. Such techniques have imbued Alexa with better contextual understanding of its surroundings with respect to smart home devices, as well as the ability to detect emotions like frustration in users' voices.