Reinforcement Learning
Deep Reinforcement Learning in Large Discrete Action Spaces
Dulac-Arnold, Gabriel, Evans, Richard, van Hasselt, Hado, Sunehag, Peter, Lillicrap, Timothy, Hunt, Jonathan, Mann, Timothy, Weber, Theophane, Degris, Thomas, Coppin, Ben
Being able to reason in an environment with a large number of discrete actions is essential to bringing reinforcement learning to a larger class of problems. Recommender systems, industrial plants and language models are only some of the many real-world tasks involving large numbers of discrete actions for which current methods are difficult or even often impossible to apply. An ability to generalize over the set of actions as well as sub-linear complexity relative to the size of the set are both necessary to handle such tasks. Current approaches are not able to provide both of these, which motivates the work in this paper. Our proposed approach leverages prior information about the actions to embed them in a continuous space upon which it can generalize. Additionally, approximate nearest-neighbor methods allow for logarithmic-time lookup complexity relative to the number of actions, which is necessary for time-wise tractable training. This combined approach allows reinforcement learning methods to be applied to large-scale learning problems previously intractable with current methods. We demonstrate our algorithm's abilities on a series of tasks having up to one million actions.
Texas Hold'em: AI is almost as good as humans at playing poker (Wired UK)
Poker playing artificial intelligence has already "approached the performance" of human experts and can use "state-of-the-art methods" in its gameplay. Researchers from University College London - including a staff member from DeepMind's Go defeating team - have created a series of reinforcement algorithms that are able to play Texas Hold'em and a simplistic Leduc poker. The AI is able to learn the game without any prior knowledge of strategies and taught itself by playing fictitious matches on its own, according to the paper Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. Research student Johannes Heinrich and lecturer and David Silver explain in the paper that the Neural Fictitious Self-Play method they created used deep reinforcement learning "to learn directly from their experience of interacting in the game". The method learnt from its mistakes and developed ways to win the games, while also utilising neural networks.
neural network model for q-learning othello? • /r/MachineLearning
Lately I've been exploring reinforcement learning. I built a q-learning agent for Othello. It is table-based, so it obviously doesn't work well because the state-space of Othello is just too big for a table. So for the past week I've been investigating neural networks as q-learning approximators. I even put one together using Keras/Theanos, and something's going right because it wins 90% of games against an opponent that plays purely randomly (but gets crushed against monte-carlo, another agent I wrote, even if the simulation time for MC is very very short).
Investigating practical linear temporal difference learning
Off-policy reinforcement learning has many applications including: learning from demonstration, learning multiple goal seeking policies in parallel, and representing predictive knowledge. Recently there has been an proliferation of new policy-evaluation algorithms that fill a longstanding algorithmic void in reinforcement learning: combining robustness to off-policy sampling, function approximation, linear complexity, and temporal difference (TD) updates. This paper contains two main contributions. First, we derive two new hybrid TD policy-evaluation algorithms, which fill a gap in this collection of algorithms. Second, we perform an empirical comparison to elicit which of these new linear TD methods should be preferred in different situations, and make concrete suggestions about practical use.
Deep Reinforcement Learning
In this tutorial I will discuss how reinforcement learning (RL) can be combined with deep learning (DL). There are several ways to combine DL and RL together, including value-based, policy-based, and model-based approaches with planning. Several of these approaches have well-known divergence issues, and I will present simple methods for addressing these instabilities. The talk will include a case study of recent successes in the Atari 2600 domain, where a single agent can learn to play many different games directly from raw pixel input.
Basics of Computational Reinforcement Learning
In machine learning, the problem of reinforcement learning is concerned with using experience gained through interacting with the world and evaluative feedback to improve a system's ability to make behavioral decisions. This tutorial will introduce the fundamental concepts and vocabulary that underlie this field of study. It will also review recent advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology.
spragunr/deep_q_rl
This code should take 2-4 days to complete. The run_nature.py script uses parameters consistent with the Nature paper. The final policies should be better, but it will take 6-10 days to finish training. Either script will store output files in a folder prefixed with the name of the ROM. Pickled version of the network objects are stored after every epoch.
Using reinforcement learning in Python to teach a virtual car to avoid obstacles
I'd like to build a self-driving, self-learning RC car that can move around my apartment at top speed without running into anything--especially my cats. But before busting out the soldering iron and scaring the crap out of Echo and Bear, I figured it best to start in a virtual environment. I've learned a lot going from "what's reinforcement learning?" to watching my Robocar skillfully traverse the environment, so I decided to share those learnings with the world. Update, Feb 24, 2016: Part 2 is now available. Update, March 7, 2016: Part 3 is now available.
Guest Post (Part I): Demystifying Deep Reinforcement Learning - Nervana
Two years ago, a small company in London called DeepMind uploaded their pioneering paper "Playing Atari with Deep Reinforcement Learning" to Arxiv. In this paper they demonstrated how a computer learned to play Atari 2600 video games by observing just the screen pixels and receiving a reward when the game score increased. The result was remarkable, because the games and the goals in every game were very different and designed to be challenging for humans. The same model architecture, without any change, was used to learn seven different games, and in three of them the algorithm performed even better than a human! It has been hailed since then as the first step towards general artificial intelligence – an AI that can survive in a variety of environments, instead of being confined to strict realms such as playing chess. No wonder DeepMind was immediately bought by Google and has been on the forefront of deep learning research ever since.