Goto

Collaborating Authors

A Study on Dialogue Reward Prediction for Open-Ended Conversational Agents

arXiv.org Artificial Intelligence

The amount of dialogue history to include in a conversational agent is often underestimated and/or set in an empirical and thus possibly naive way. This suggests that principled investigations into optimal context windows are urgently needed given that the amount of dialogue history and corresponding representations can play an important role in the overall performance of a conversational system. This paper studies the amount of history required by conversational agents for reliably predicting dialogue rewards. The task of dialogue reward prediction is chosen for investigating the effects of varying amounts of dialogue history and their impact on system performance. Experimental results using a dataset of 18K human-human dialogues report that lengthy dialogue histories of at least 10 sentences are preferred (25 sentences being the best in our experiments) over short ones, and that lengthy histories are useful for training dialogue reward predictors with strong positive correlations between target dialogue rewards and predicted ones.


Subgoal Discovery for Hierarchical Dialogue Policy Learning

arXiv.org Artificial Intelligence

Developing conversational agents to engage in complex dialogues is challenging partly because the dialogue policy needs to explore a large state-action space. In this paper, we propose a divide-and-conquer approach that discovers and exploits the hidden structure of the task to enable efficient policy learning. First, given a set of successful dialogue sessions, we present a Subgoal Discovery Network (SDN) to divide a complex goal-oriented task into a set of simpler subgoals in an unsupervised fashion. We then use these subgoals to learn a hierarchical policy which consists of 1) a top-level policy that selects among subgoals, and 2) a low-level policy that selects primitive actions to accomplish the subgoal. We exemplify our method by building a dialogue agent for the composite task of travel planning. Experiments with simulated and real users show that an agent trained with automatically discovered subgoals performs competitively against an agent with human-defined subgoals, and significantly outperforms an agent without subgoals. Moreover, we show that learned subgoals are human comprehensible.


Reinforcement Learning for Spoken Dialogue Systems

Neural Information Processing Systems

Recently,a number of authorshave proposedtreating dialogue systems as Markov decision processes(MDPs). However,the practicalapplicationofMDP algorithms to dialogue systems faces a numberof severe technicalchallenges.We have built a general software tool (RLDS, for ReinforcementLearning for Dialogue Systems) on the MDP framework, and have applied it to dialogue corpora gatheredbased from two dialoguesystemsbuilt at AT&T Labs. Our experimentsdemonstratethat RLDS holds promise as a tool for "browsing" and understandingcorrelationsin complex, temporallydependentdialogue corpora.


Reinforcement Learning for Spoken Dialogue Systems

Neural Information Processing Systems

Recently, a number of authors have proposed treating dialogue systems as Markov decision processes (MDPs). However, the practical application ofMDP algorithms to dialogue systems faces a number of severe technical challenges. We have built a general software tool (RLDS, for Reinforcement Learning for Dialogue Systems) based on the MDP framework, and have applied it to dialogue corpora gathered from two dialogue systems built at AT&T Labs. Our experiments demonstrate that RLDS holds promise as a tool for "browsing" and understanding correlations in complex, temporally dependent dialogue corpora.


Reinforcement Learning for Spoken Dialogue Systems

Neural Information Processing Systems

Recently, a number of authors have proposed treating dialogue systems as Markov decision processes (MDPs). However, the practical application ofMDP algorithms to dialogue systems faces a number of severe technical challenges. We have built a general software tool (RLDS, for Reinforcement Learning for Dialogue Systems) based on the MDP framework, and have applied it to dialogue corpora gathered from two dialogue systems built at AT&T Labs. Our experiments demonstrate that RLDS holds promise as a tool for "browsing" and understanding correlations in complex, temporally dependent dialogue corpora.