We study open domain dialogue generation with dialogue acts designed to explain how people engage in social chat. To imitate human behavior, we propose managing the flow of human-machine interactions with the dialogue acts as policies. The policies and response generation are jointly learned from human-human conversations, and the former is further optimized with a reinforcement learning approach. With the dialogue acts, we achieve significant improvement over state-of-the-art methods on response quality for given contexts and dialogue length in both machine-machine simulation and human-machine conversation.
There is an increasing demand for task-oriented dialogue systems which can assist users in various activities such as booking tickets and restaurant reservations. In order to complete dialogues effectively, dialogue policy plays a key role in task-oriented dialogue systems. As far as we know, the existing task-oriented dialogue systems obtain the dialogue policy through classification, which can assign either a dialogue act and its corresponding parameters or multiple dialogue acts without their corresponding parameters for a dialogue action. In fact, a good dialogue policy should construct multiple dialogue acts and their corresponding parameters at the same time. However, it's hard for existing classification-based methods to achieve this goal. Thus, to address the issue above, we propose a novel generative dialogue policy learning method. Specifically, the proposed method uses attention mechanism to find relevant segments of given dialogue context and input utterance and then constructs the dialogue policy by a seq2seq way for task-oriented dialogue systems. Extensive experiments on two benchmark datasets show that the proposed model significantly outperforms the state-of-the-art baselines. In addition, we have publicly released our codes.
Reinforcement learning methods have been used for learning dialogue policies from the experience of conversations. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the relatively small number of successful dialogues in early learning phase. Hindsight experience replay (HER) enables an agent to learn from failure, but the vanilla HER is inapplicable to dialogue domains due to dialogue goals being implicit (c.f., explicit goals in manipulation tasks). In this work, we develop two complex HER methods providing different trade-offs between complexity and performance. Experiments were conducted using a realistic user simulator. Results suggest that our HER methods perform better than standard and prioritized experience replay methods (as applied to deep Q-networks) in learning rate, and that our two complex HER methods can be combined to produce the best performance.
Developing conversational agents to engage in complex dialogues is challenging partly because the dialogue policy needs to explore a large state-action space. In this paper, we propose a divide-and-conquer approach that discovers and exploits the hidden structure of the task to enable efficient policy learning. First, given a set of successful dialogue sessions, we present a Subgoal Discovery Network (SDN) to divide a complex goal-oriented task into a set of simpler subgoals in an unsupervised fashion. We then use these subgoals to learn a hierarchical policy which consists of 1) a top-level policy that selects among subgoals, and 2) a low-level policy that selects primitive actions to accomplish the subgoal. We exemplify our method by building a dialogue agent for the composite task of travel planning. Experiments with simulated and real users show that an agent trained with automatically discovered subgoals performs competitively against an agent with human-defined subgoals, and significantly outperforms an agent without subgoals. Moreover, we show that learned subgoals are human comprehensible.