Reinforcement Learning
Artificial Intelligence's Next Big Step: Reinforcement Learning - The New Stack
Almost every machine learning breakthrough you hear about (and most of what's currently called "artificial intelligence") is supervised learning; where you start with a curated and labeled data set. But another technique, reinforcement learning, is just starting to make its way out of the research lab. Reinforcement learning is where an agent learns by interacting with its environment. It isn't told by a trainer what to do and it learns what actions to take to get the highest reward in the situation by trial and error, even when the reward isn't obvious and immediate. It learns how to solve problems rather than being taught what solutions look like. Reinforcement learning is how DeepMind created the AlphaGo system that beat a high-ranking Go player (and has recently been winning online Go matches anonymously). It's how University of California Berkeley's BRETT robot learns how to move its hands and arms to perform physical tasks like stacking blocks or screwing the lid onto a bottle, in just three hours (or ten minutes if it's told where the objects are that it's going to work with, and where they need to end up).
Learning Policies For Learning Policies -- Meta Reinforcement Learning (RLยฒ) in Tensorflow
Reinforcement Learning provides a framework for training agents to solve problems in the world. One of the limitations of these agents however is their inflexibility once trained. They are able to learn a policy to solve a specific problem (formalized as an MDP), but that learned policy is often useless in new problems, even relatively similar ones. Imagine the simplest possible agent: one trained to solve a two-armed bandit task in which one arm always provides a positive reward, and the other arm always provides no reward. Using any RL algorithm such as Q-Learning or Policy Gradient, the agent can quickly learn to always choose the arm with the positive reward.
Artificial Intelligence: Reinforcement Learning in Python
When people talk about artificial intelligence, they usually don't mean supervised and unsupervised machine learning. These tasks are pretty trivial compared to what we think of AIs doing - playing chess and Go, driving cars, and beating video games at a superhuman level. Reinforcement learning has recently become popular for doing all of that and more. Much like deep learning, a lot of the theory was discovered in the 70s and 80s but it hasn't been until recently that we've been able to observe first hand the amazing results that are possible. In 2016 we saw Google's AlphaGo beat the world Champion in Go. We saw AIs playing video games like Doom and Super Mario.
jiweil/Neural-Dialogue-Generation
This project is maintained by Jiwei Li. This repo will continue to be updated. After training, the trained models will be saved in save_t_given_s/model*. Decoding given a pre-trained generative model. The pre-trained model doesn't have to be a vanila Seq2Seq model (for example, it can be a trained model from adversarial learning).
VIME: Variational Information Maximizing Exploration
Houthooft, Rein, Chen, Xi, Duan, Yan, Schulman, John, De Turck, Filip, Abbeel, Pieter
Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.
Continuing To Learn the Structure of Learning
Learning to reinforcement learn by Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context.
Learning to reinforcement learn
Wang, Jane X, Kurth-Nelson, Zeb, Tirumala, Dhruva, Soyer, Hubert, Leibo, Joel Z, Munos, Remi, Blundell, Charles, Kumaran, Dharshan, Botvinick, Matt
In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.
Pioneering AI researcher to advise RBC's machine learning lab
A pioneer in machine learning from the University of Alberta is teaming up with the Royal Bank of Canada on artificial intelligence research. Richard Sutton, a professor at the school's department of computer science and a graduate of the University of Massachusetts, will advise the bank's machine learning research division and collaborate with RBC's second AI research lab, to be located in Edmonton. Sutton specializes in the same branch of machine learning that Google's AlphaGo computer program used, in part, to beat one of the highest-ranking professional players of the board game Go -- until recently, a notoriously difficult game for computers to play. The announcement is the latest in a string of AI-related partnerships, acquisitions and investments that have been struck in Canada in recent months -- the most high-profile of which have involved Facebook and Google, which have been in a fierce competition for access to talent. For over three decades, Sutton has specialized in reinforcement learning. In this branch of machine learning, an algorithm is designed to receive either a reward or penalty based on its behaviour, and learns to make choices that will result in the most reward -- and, hopefully, most desired behaviour -- over time.