Goto

Collaborating Authors

 Cuayáhuitl, Heriberto


Ensemble-Based Deep Reinforcement Learning for Chatbots

arXiv.org Artificial Intelligence

Such an agent is typically characterised by: (i) a finite set of states 6 S {s i} that describe all possible situations in the environment; (ii) a finite set of actions A {a j} to change in the environment from one situation to another; (iii) a state transition function T (s,a,s null) that specifies the next state s null for having taken action a in the current state s; (iv) a reward function R (s,a,s null) that specifies a numerical value given to the agent for taking action a in state s and transitioning to state s null; and (v) a policy π: S A that defines a mapping from states to actions [2, 30]. The goal of a reinforcement learning agent is to find an optimal policy by maximising its cumulative discounted reward defined as Q (s,a) max π E[r t γr t 1 γ 2 r t 1 ... s t s,a t a,π ], where function Q represents the maximum sum of rewards r t discounted by factor γ at each time step. While a reinforcement learning agent takes actions with probability Pr ( a s) during training, it selects the best action at test time according to π (s) arg max a A Q (s,a). A deep reinforcement learning agent approximates Q using a multi-layer neural network [31]. The Q function is parameterised as Q(s,a; θ), where θ are the parameters or weights of the neural network (recurrent neural network in our case). Estimating these weights requires a dataset of learning experiences D {e 1,...e N} (also referred to as'experience replay memory'), where every experience is described as a tuple e t ( s t,a t,r t,s t 1). Inducing a Q function consists in applying Q-learning updates over minibatches of experience MB {( s,a,r,s null) U (D)} drawn uniformly at random from the full dataset D . This process is implemented in learning algorithms using Deep Q-Networks (DQN) such as those described in [31, 32, 33], and the following section describes a DQN-based algorithm for human-chatbot interaction.


A Data-Efficient Deep Learning Approach for Deployable Multimodal Social Robots

arXiv.org Artificial Intelligence

The deep supervised and reinforcement learning paradigms (among others) have the potential to endow interactive multimodal social robots with the ability of acquiring skills autonomously. But it is still not very clear yet how they can be best deployed in real world applications. As a step in this direction, we propose a deep learning-based approach for efficiently training a humanoid robot to play multimodal games---and use the game of `Noughts & Crosses' with two variants as a case study. Its minimum requirements for learning to perceive and interact are based on a few hundred example images, a few example multimodal dialogues and physical demonstrations of robot manipulation, and automatic simulations. In addition, we propose novel algorithms for robust visual game tracking and for competitive policy learning with high winning rates, which substantially outperform DQN-based baselines. While an automatic evaluation shows evidence that the proposed approach can be easily extended to new games with competitive robot behaviours, a human evaluation with 130 humans playing with the Pepper robot confirms that highly accurate visual perception is required for successful game play.


Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

arXiv.org Artificial Intelligence

Training chatbots using the reinforcement learning paradigm is challenging due to high-dimensional states, infinite action spaces and the difficulty in specifying the reward function. We address such problems using clustered actions instead of infinite actions, and a simple but promising reward function based on human-likeness scores derived from human-human dialogue data. We train Deep Reinforcement Learning (DRL) agents using chitchat data in raw text---without any manual annotations. Experimental results using different splits of training data report the following. First, that our agents learn reasonable policies in the environments they get familiarised with, but their performance drops substantially when they are exposed to a test set of unseen dialogues. Second, that the choice of sentence embedding size between 100 and 300 dimensions is not significantly different on test data. Third, that our proposed human-likeness rewards are reasonable for training chatbots as long as they use lengthy dialogue histories of >=10 sentences.


A Study on Dialogue Reward Prediction for Open-Ended Conversational Agents

arXiv.org Artificial Intelligence

The amount of dialogue history to include in a conversational agent is often underestimated and/or set in an empirical and thus possibly naive way. This suggests that principled investigations into optimal context windows are urgently needed given that the amount of dialogue history and corresponding representations can play an important role in the overall performance of a conversational system. This paper studies the amount of history required by conversational agents for reliably predicting dialogue rewards. The task of dialogue reward prediction is chosen for investigating the effects of varying amounts of dialogue history and their impact on system performance. Experimental results using a dataset of 18K human-human dialogues report that lengthy dialogue histories of at least 10 sentences are preferred (25 sentences being the best in our experiments) over short ones, and that lengthy histories are useful for training dialogue reward predictors with strong positive correlations between target dialogue rewards and predicted ones.


Deep Reinforcement Learning for Multi-Domain Dialogue Systems

arXiv.org Artificial Intelligence

Standard deep reinforcement learning methods such as Deep Q-Networks (DQN) for multiple tasks (domains) face scalability problems. We propose a method for multi-domain dialogue policy learning---termed NDQN, and apply it to an information-seeking spoken dialogue system in the domains of restaurants and hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed) using simulations report that our proposed method exhibits better scalability and is promising for optimising the behaviour of multi-domain dialogue systems.


SimpleDS: A Simple Deep Reinforcement Learning Dialogue System

arXiv.org Artificial Intelligence

Almost two decades ago, the (spoken) dialogue systems community adopted the Reinforcement Learning (RL) paradigm since it offered the possibility to treat dialogue design as an optimisation problem, and because RL-based systems can improve their performance over time with experience. Although a large number of methods have been proposed for training (spoken) dialogue systems using RL, the question of "How to train dialogue policies in an efficient, scalable and effective way across domains?" still remains as an open problem. One limitation of current approaches is the fact that RL-based dialogue systems still require high-levels of human intervention (from system developers), as opposed to automating the dialogue design. Training a system of this kind requires a system developer to provide a set of features to describe the dialogue state, a set of actions to control the interaction, and a performance function to reward or penalise the action-selection process. All of these elements have to be carefully engineered in order to learn a good dialogue policy (or policies). This suggests that one way of advancing the state-of-the-art in this field is by reducing the amount of human intervention in the dialogue design process through higher degrees of automation, i.e. by moving towards truly autonomous learning.


Reports of the AAAI 2014 Conference Workshops

AI Magazine

The AAAI-14 Workshop program was held Sunday and Monday, July 27–28, 2012, at the Québec City Convention Centre in Québec, Canada. The AAAI-14 workshop program included fifteen workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Robotics; Artificial Intelligence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Informatics; Incentives and Trust in Electronic Communities; Intelligent Cinematography and Editing; Machine Learning for Interactive Systems: Bridging the Gap between Perception, Action and Communication; Modern Artificial Intelligence for Health Analytics; Multiagent Interaction without Prior Coordination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities -- Beyond Open Data to Models, Standards and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and The World Wide Web and Public Health Intelligence. This article presents short summaries of those events.


Reports of the AAAI 2014 Conference Workshops

AI Magazine

The AAAI-14 Workshop program was held Sunday and Monday, July 27–28, 2012, at the Québec City Convention Centre in Québec, Canada. Canada. The AAAI-14 workshop program included fifteen workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Robotics; Artificial Intelligence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Informatics; Incentives and Trust in Electronic Communities; Intelligent Cinematography and Editing; Machine Learning for Interactive Systems: Bridging the Gap between Perception, Action and Communication; Modern Artificial Intelligence for Health Analytics; Multiagent Interaction without Prior Coordination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities — Beyond Open Data to Models, Standards and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and The World Wide Web and Public Health Intelligence. This article presents short summaries of those events.


Preface

AAAI Conferences

This workshop contains papers with a strong relationship to interactive systems and robots in the following topics (in no particular order): robot learning from natural language interactions; robot learning from social multimodal interactions; robot learning using crowdsourcing; reinforcement learning with reward inference of conversational behaviors; reinforcement and neural learning to transfer learnt behaviors across tasks; learning from demonstration for human-robot interaction/collaboration; supervised learning for coaching physical skills; visually-aware reinforcement learning in unknown environments; Markov decision processes for adaptive interactions in video games; and Markov decision processes for grounding natural language commands.