Goto

Collaborating Authors

Reinforcement Learning for Spoken Dialogue Systems

Neural Information Processing Systems

Recently,a number of authorshave proposedtreating dialogue systems as Markov decision processes(MDPs). However,the practicalapplicationofMDP algorithms to dialogue systems faces a numberof severe technicalchallenges.We have built a general software tool (RLDS, for ReinforcementLearning for Dialogue Systems) on the MDP framework, and have applied it to dialogue corpora gatheredbased from two dialoguesystemsbuilt at AT&T Labs. Our experimentsdemonstratethat RLDS holds promise as a tool for "browsing" and understandingcorrelationsin complex, temporallydependentdialogue corpora.


Reinforcement Learning for Spoken Dialogue Systems

Neural Information Processing Systems

Recently, a number of authors have proposed treating dialogue systems as Markov decision processes (MDPs). However, the practical application ofMDP algorithms to dialogue systems faces a number of severe technical challenges. We have built a general software tool (RLDS, for Reinforcement Learning for Dialogue Systems) based on the MDP framework, and have applied it to dialogue corpora gathered from two dialogue systems built at AT&T Labs. Our experiments demonstrate that RLDS holds promise as a tool for "browsing" and understanding correlations in complex, temporally dependent dialogue corpora.


Reinforcement Learning for Spoken Dialogue Systems

Neural Information Processing Systems

Recently, a number of authors have proposed treating dialogue systems as Markov decision processes (MDPs). However, the practical application ofMDP algorithms to dialogue systems faces a number of severe technical challenges. We have built a general software tool (RLDS, for Reinforcement Learning for Dialogue Systems) based on the MDP framework, and have applied it to dialogue corpora gathered from two dialogue systems built at AT&T Labs. Our experiments demonstrate that RLDS holds promise as a tool for "browsing" and understanding correlations in complex, temporally dependent dialogue corpora.


1939

AI Magazine

The Dialogue on Dialogues workshop was organized as a satellite event at the Interspeech 2006 conference in Pittsburgh, Pennsylvania, and it was held on September 17, 2006, immediately before the main conference. It was planned and coordinated by Michael McTear (University of Ulster, UK), Kristiina Jokinen (University of Helsinki, Finland), and James A. Larson (Portland State University, USA). The one-day workshop involved more than 40 participants from Europe, the United States, Australia, and Japan. One of the motivations for furthering the systems' interaction capabilities is to improve the AI Magazine Volume 28 Number 2 (2007) ( AAAI) However, relatively little work has so far been devoted to defining the criteria according to which we could evaluate such systems in terms of increased naturalness and usability. It is often felt that statistical speech-based research is not fully appreciated in the dialogue community, while dialogue modeling in the speech community seems too simple in terms of the advanced architectures and functionalities under investigation in the dialogue community.


Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation

arXiv.org Artificial Intelligence

Dialogue policy optimization often obtains feedback until task completion in task-oriented dialogue systems. This is insufficient for training intermediate dialogue turns since supervision signals (or rewards) are only provided at the end of dialogues. To address this issue, reward learning has been introduced to learn from state-action pairs of an optimal policy to provide turn-by-turn rewards. This approach requires complete state-action annotations of human-to-human dialogues (i.e., expert demonstrations), which is labor intensive. To overcome this limitation, we propose a novel reward learning approach for semi-supervised policy learning. The proposed approach learns a dynamics model as the reward function which models dialogue progress (i.e., state-action sequences) based on expert demonstrations, either with or without annotations. The dynamics model computes rewards by predicting whether the dialogue progress is consistent with expert demonstrations. We further propose to learn action embeddings for a better generalization of the reward function. The proposed approach outperforms competitive policy learning baselines on MultiWOZ, a benchmark multi-domain dataset.