"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.
Disclaimer: This is a work in progress project there may be errors! In order to fast recap my knowledge of Reinforcement Learning, I created this Cheat Sheet with all the basic formulas and algorithms. I hope this may be useful to you. You can find the full pdf here, and the repo here. Thanks to AlexandreBeaulne that added Contraction Mapping, Sarsa and cleanup the latex.
I'm currently implementing an A3C agent in Tensorflow (Asynchronous Advantage Actor Critic) that plays doom (using vizdoom) and I was thinking about if there is a difference between using CNNs or Capsnets (Capsule Networks), Recently there was a big breakthrough in computer vision with these Capsnets. I know that Capsnets, instead of Convnets, handle the spatial relationship of the features and detecting rotated objects. As a consequence, I wondered if there is an advantage to use Capsnets in a Deep Reinforcement Learning agent?
Recent news from the OpenAI people is all about a bonus trio. They are releasing new Gym environments--a set of simulated robotics environments based on real robot platforms--including a Shadow hand and a Fetch research robot, said IEEE Spectrum. In addition to that toolkit, they are releasing an open source version of Hindsight Experience Replay (HER). As its name suggests, it helps robots learn from hindsight, for goals-based robotic tasks. Last but not least, they released a set of requests for robotics research.
Today we're going to be learning about reinforcement learning. The ultimate goal of this endeavor is to create an artificial intelligence that is a strong Othello player, and can teach you how to become stronger yourself. I explained the rules of Othello, my motivation, and how to create a playable game in Step 1 of this series. I created some basic artificial intelligence in Step 2 of this series. The next thing I want to do is to use machine learning to create an even better artificial intelligence, but before I can even do that, I need to learn how to implement reinforcement learning.
Can we land a SpaceX Falcon Heavy Rocket in simulation using machine learning? Yes! Reinforcement learning is a technique that lets an agent learn how best to act in an environment using rewards as its signal. OpenAI released a library called Gym that lets us train AI agents really easily. We'll use a combination of the Tensorflow and gym libraries to build an RL agent capable of landing a rocket perfectly.
In recent months, researchers at OpenAI have been focusing on developing artificial intelligence (AI) that learns better. Their machine learning algorithms are now capable of training themselves, so to speak, thanks to the reinforcement learning methods of their OpenAI Baselines. Now, a new algorithm lets their AI learn from its own mistakes, almost as human beings do. The development comes from a new open-source algorithm called Hindsight Experience Replay (HER), which OpenAI researchers released earlier this week. As its name suggests, HER helps an AI agent "look back" in hindsight, so to speak, as it completes a task.