Goto

Collaborating Authors

 Reinforcement Learning


Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

This paper presents an end-to-end framework for task-oriented dialog systems using a variant of Deep Recurrent Q-Networks (DRQN). The model is able to interface with a relational database and jointly learn policies for both language understanding and dialog strategy. Moreover, we propose a hybrid algorithm that combines the strength of reinforcement learning and supervised learning to achieve faster learning speed. We evaluated the proposed model on a 20 Question Game conversational game simulator. Results show that the proposed method outperforms the modular-based baseline and learns a distributed representation of the latent dialog state.


Bayesian Reinforcement Learning: A Survey

arXiv.org Machine Learning

Bayesian methods for machine learning have been widely investigated, yielding principled methods for incorporating prior information into inference algorithms. In this survey, we provide an in-depth review of the role of Bayesian methods for the reinforcement learning (RL) paradigm. The major incentives for incorporating Bayesian reasoning in RL are: 1) it provides an elegant approach to action-selection (exploration/exploitation) as a function of the uncertainty in learning; and 2) it provides a machinery to incorporate prior knowledge into the algorithms. We first discuss models and methods for Bayesian inference in the simple single-step Bandit model. We then review the extensive recent literature on Bayesian methods for model-based RL, where prior information can be expressed on the parameters of the Markov model. We also present Bayesian methods for model-free RL, where priors are expressed over the value function or policy class. The objective of the paper is to provide a comprehensive survey on Bayesian RL algorithms and their theoretical and empirical properties.


Teaching Your Computer To Play Super Mario Bros. โ€“ A Fork of the Google DeepMind Atari Machine Learning Project

#artificialintelligence

The second issue I noticed was that there seemed to be little connection between the network's confidence in its actions and its actual score. I came across another recent paper on something called Double Q Learning, also courtesy of DeepMind, which substantially improved Google's original results. Double Q Learning counters the tendency for Q networks to become overconfident in their predictions. I changed Google's original Deep Q Network to a Double Deep Q Network, and that helped substantially. Finally, the biggest improvement of all came when I was just more patient. Even running on a powerful machine with a Nvidia 980 GPU, the emulator could only go so fast. As a consequence, one million training steps took about an entire day, with quite a bit of variance in the scores along the way.


Machine Learning Techniques Aim to Reduce Traffic ENGINEERING.com

#artificialintelligence

It's a problem we can all relate to: sitting in traffic and waiting for a green light. While waiting, you may have even pondered how you would try to improve traffic efficiency--surely there's got to be some way for everyone to get to work on time. But ponder no longer, because a team of engineers from Tsinghua University in China has handed the problem over to machines. The team's recent study makes use of deep reinforcement learning algorithms to optimize traffic signaling, and its promising results suggest there may be a way to arrive on time after all. Let's be clear: traffic is a complex problem to solve, and traffic control engineers have long worked on improving efficiency.


Policy Networks with Two-Stage Training for Dialogue Systems

arXiv.org Artificial Intelligence

In this paper, we propose to use deep policy networks which are trained with an advantage actor-critic method for statistically optimised dialogue systems. First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods. Summary state and action spaces lead to good performance but require pre-engineering effort, RL knowledge, and domain expertise. In order to remove the need to define such summary spaces, we show that deep RL can also be trained efficiently on the original state and action spaces. Dialogue systems based on partially observable Markov decision processes are known to require many dialogues to train, which makes them unappealing for practical deployment. We show that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently. Indeed, with only a few hundred dialogues collected with a handcrafted policy, the actor-critic deep learner is considerably boot-strapped from a combination of supervised and batch RL. In addition, convergence to an optimal policy is significantly sped up compared to other deep RL methods initialized on the data with batch RL. All experiments are performed on a restaurant domain derived from the Dialogue State Tracking Challenge 2 (DSTC2) dataset.


matthiasplappert/keras-rl

#artificialintelligence

Just like Keras, it works with either Theano or TensorFlow, which means that you can train your algorithm efficiently either on CPU or GPU. This means that evaluating and playing around with different algorithms is easy. Of course you can extend keras-rl according to your own needs. You can use built-in Keras callbacks and metrics or define your own. Even more so, it is easy to implement your own environments and even algorithms by simply extending some simple abstract classes.



Demystifying Machine Learning Part 2: Supervised, Unsupervised, and Reinforcement Learning

#artificialintelligence

In the first blog post of this series we introduced the topic of machine learning and discussed why there is a lot of excitement around the topic. In this blog we explore different types of machine learning. Let's start with a simple example that everyone can relate to. You want to teach a three year old some basic discipline of keeping their toys in the right place. The room is full of interlocking blocks and soft toys.


AI will dictate the future of strategy

#artificialintelligence

LONDON: Technological developments will dramatically change the role of agency strategists as they move from a free-associating, subjective approach to a more empirical, objective and advisory role, a leading industry figure has said. Writing in the current issue of Admap, Mark Holden, Worldwide Strategy and Planning Director at PHD, outlines the future direction of strategy that will start to emerge once attribution modelling and demand-side platforms come together. Currently, users log in to the former to pull out insights and then log in to the latter to execute their strategies. "When they are finally joined up, this will create the first closed system our industry has ever experienced," Holden says, "with this the basis into which we can drop a reinforcement learning algorithm." The point about reinforcement learning โ€“ an emerging area of artificial intelligence โ€“ is that it requires a closed system, where action and outcome are inextricably linked, in order to further develop.


OpenAI Creates a Gym to Train Your AI

#artificialintelligence

Open AI, a non-profit artificial intelligence research company backed by Elon Musk, launched a toolkit for developing and comparing reinforcement learning algorithms. OpenAI Gym is a suite of environments that include simulated robotic tasks and Atari games as well as a website for people to post their results and share code. OpenAI researcher John Schulman shared some details about his organization, why reinforcement learning is important and how the OpenAI Gym will make it easier for AI researchers to design, iterate and improve their next generation applications.