Collaborating Authors

An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email

Journal of Artificial Intelligence Research

This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders.

How to build smarter chatbots


We're going to be blunt: Chatbots in their current form aren't great. We were promised bots that would change the way we interact with businesses and services, but instead we have interactive bots that perform worse than apps. They are primarily focused on taps or interactive graphical interfaces, and conversing with them using natural language is nearly impossible. Take an example of Poncho Weather on Facebook Messenger. Let's say I'm going to a conference next Monday in San Diego and want to know what the forecast is.

A Study on Dialogue Reward Prediction for Open-Ended Conversational Agents Artificial Intelligence

The amount of dialogue history to include in a conversational agent is often underestimated and/or set in an empirical and thus possibly naive way. This suggests that principled investigations into optimal context windows are urgently needed given that the amount of dialogue history and corresponding representations can play an important role in the overall performance of a conversational system. This paper studies the amount of history required by conversational agents for reliably predicting dialogue rewards. The task of dialogue reward prediction is chosen for investigating the effects of varying amounts of dialogue history and their impact on system performance. Experimental results using a dataset of 18K human-human dialogues report that lengthy dialogue histories of at least 10 sentences are preferred (25 sentences being the best in our experiments) over short ones, and that lengthy histories are useful for training dialogue reward predictors with strong positive correlations between target dialogue rewards and predicted ones.

Jobs at


Our start-up began 2 years ago and we have successfully gone through 23M round-B funding. We have an awesome dataset, an awesome team of data scientists and an equally awesome NLP challenge. We have the ability to quickly produce labeled datasets and test novel NLP techniques, including semantic parsing, deep learning (Convolutional, Recurrent, Recursive neural nets), and various forms of dialogue modeling (e.g. We are looking for PhD-level candidates (or equivalent) with a strong background in either semantic parsers, entity extraction from text, or human-agent dialogue modeling. The candidate is expected to be able to design, implement and lead the evolution of one of our critical NLP tasks together with a team.