Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construction and empirical evaluation of NJFun, an experimental spoken dialogue system that provides users with access to information about fun things to do in New Jersey. Our results show that by optimizing its performance via reinforcement learning, NJFun measurably improves system performance.
Although it is now possible to build real-time spoken dialogue systems for a wide variety of applications, designing the dialogue strategy of such systems involves a number of nontrivial design choices. These choices can seriously impact system performance, and include choosing appropriate strategies for initiative (whether to accept relatively open-ended versus constrained user utterances) and confirmation (when to confirm a user's previous utterance). Typically, dialogue strategy design has been done in an ad-hoc manner. This paper summarizes how my recent research has addressed the design, implementation, and experimental evaluation of methods for dynamically adapting and optimizing a system's dialogue strategy, in order to improve system performance. We have evaluated the utility of allowing users to dynamically adapt (at any point(s) a dialogue) the initial set of initiative and confirmation dialogue strategies used by TOOT, a spoken dialogue system for retrieving train schedules on the web.
This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders.
Spoken dialogue system performance can vary widely for different users, as well for the same user during different dialogues. This paper presents the design and evaluation of an adaptive version of TOOT, a spoken dialogue system for retrieving online train schedules. Adaptive TOOT predicts whether a user is having speech recognition problems as a particular dialogue progresses, and automatically adapts its dialogue strategies based on its predictions. An empirical evaluation of the system demonstrates the utility of the approach.