Collaborating Authors

ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems Artificial Intelligence

We present ConvLab-2, an open-source toolkit that enables researchers to build task-oriented dialogue systems with state-of-the-art models, perform an end-to-end evaluation, and diagnose the weakness of systems. As the successor of ConvLab (Lee et al., 2019b), ConvLab-2 inherits ConvLab's framework but integrates more powerful dialogue models and supports more datasets. Besides, we have developed an analysis tool and an interactive tool to assist researchers in diagnosing dialogue systems. The analysis tool presents rich statistics and summarizes common mistakes from simulated dialogues, which facilitates error analysis and system improvement. The interactive tool provides a user interface that allows developers to diagnose an assembled dialogue system by interacting with the system and modifying the output of each system component.


AI Magazine

The Dialogue on Dialogues workshop was organized as a satellite event at the Interspeech 2006 conference in Pittsburgh, Pennsylvania, and it was held on September 17, 2006, immediately before the main conference. It was planned and coordinated by Michael McTear (University of Ulster, UK), Kristiina Jokinen (University of Helsinki, Finland), and James A. Larson (Portland State University, USA). The one-day workshop involved more than 40 participants from Europe, the United States, Australia, and Japan. One of the motivations for furthering the systems' interaction capabilities is to improve the AI Magazine Volume 28 Number 2 (2007) ( AAAI) However, relatively little work has so far been devoted to defining the criteria according to which we could evaluate such systems in terms of increased naturalness and usability. It is often felt that statistical speech-based research is not fully appreciated in the dialogue community, while dialogue modeling in the speech community seems too simple in terms of the advanced architectures and functionalities under investigation in the dialogue community.

Ensemble-Based Deep Reinforcement Learning for Chatbots Artificial Intelligence

Such an agent is typically characterised by: (i) a finite set of states 6 S {s i} that describe all possible situations in the environment; (ii) a finite set of actions A {a j} to change in the environment from one situation to another; (iii) a state transition function T (s,a,s null) that specifies the next state s null for having taken action a in the current state s; (iv) a reward function R (s,a,s null) that specifies a numerical value given to the agent for taking action a in state s and transitioning to state s null; and (v) a policy π: S A that defines a mapping from states to actions [2, 30]. The goal of a reinforcement learning agent is to find an optimal policy by maximising its cumulative discounted reward defined as Q (s,a) max π E[r t γr t 1 γ 2 r t 1 ... s t s,a t a,π ], where function Q represents the maximum sum of rewards r t discounted by factor γ at each time step. While a reinforcement learning agent takes actions with probability Pr ( a s) during training, it selects the best action at test time according to π (s) arg max a A Q (s,a). A deep reinforcement learning agent approximates Q using a multi-layer neural network [31]. The Q function is parameterised as Q(s,a; θ), where θ are the parameters or weights of the neural network (recurrent neural network in our case). Estimating these weights requires a dataset of learning experiences D {e 1,...e N} (also referred to as'experience replay memory'), where every experience is described as a tuple e t ( s t,a t,r t,s t 1). Inducing a Q function consists in applying Q-learning updates over minibatches of experience MB {( s,a,r,s null) U (D)} drawn uniformly at random from the full dataset D . This process is implemented in learning algorithms using Deep Q-Networks (DQN) such as those described in [31, 32, 33], and the following section describes a DQN-based algorithm for human-chatbot interaction.

A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management Machine Learning

Dialogue assistants are rapidly becoming an indispensable daily aid. To avoid the significant effort needed to hand-craft the required dialogue flow, the Dialogue Management (DM) module can be cast as a continuous Markov Decision Process (MDP) and trained through Reinforcement Learning (RL). Several RL models have been investigated over recent years. However, the lack of a common benchmarking framework makes it difficult to perform a fair comparison between different models and their capability to generalise to different environments. Therefore, this paper proposes a set of challenging simulated environments for dialogue model development and evaluation. To provide some baselines, we investigate a number of representative parametric algorithms, namely deep reinforcement learning algorithms - DQN, A2C and Natural Actor-Critic and compare them to a non-parametric model, GP-SARSA. Both the environments and policy models are implemented using the publicly available PyDial toolkit and released on-line, in order to establish a testbed framework for further experiments and to facilitate experimental reproducibility.