Reinforcement learning refers to a group of methods from artificial intelligence where an agent performs learning through trial and error. It differs from supervised learning, since reinforcement learning requires no explicit labels; instead, the agent interacts continuously with its environment. That is, the agent starts in a specific state and then performs an action, based on which it transitions to a new state and, depending on the outcome, receives a reward. Different strategies (e.g. Q-learning) have been proposed to maximize the overall reward, resulting in a so-called policy, which defines the best possible action in each state. Mathematically, this process can be formalized by a Markov decision process and it has been implemented by packages in R; however, there is currently no package available for reinforcement learning. As a remedy, this paper demonstrates how to perform reinforcement learning in R and, for this purpose, introduces the ReinforcementLearning package. The package provides a remarkably flexible framework and is easily applied to a wide range of different problems. We demonstrate its use by drawing upon common examples from the literature (e.g. finding optimal game strategies).
Before I explain what Q Learning is, I will quickly explain the basic principle of reinforcement learning. Reinforcement learning is a category of machine learning algorithms where the systems learn on their own by interacting with the environment. The idea is that a reward is provided to the agent if the action it takes is correct. Otherwise, some penalty is assigned to discourage the action. It is similar to how we train dogs to perform tricks, give it a snack for successfully doing a roll and rebuke it for dirtying your carpet.
Today we'll learn about Q-Learning. Q-Learning is a value-based Reinforcement Learning algorithm. This article is the second part of a free series of blog post about Deep Reinforcement Learning. See the first article here. In this article you'll learn: Let's say you're a knight and you need to save the princess trapped in the castle shown on the map above.
This is the first in a series of articles on reinforcement learning and OpenAI Gym. Suppose you're playing a video game. You enter a room with two doors. Behind Door 1 are 100 gold coins, followed by a passageway. Behind Door 2 is 1 gold coin, followed by a second passageway going in a different direction.
Let's take a deep dive into reinforcement learning. In this article, we will tackle a concrete problem with modern libraries such as TensorFlow, TensorBoard, Keras, and OpenAI gym. You will see how to implement one of the fundamental algorithms called deep $Q$-learning to learn its inner workings. Regarding the hardware, the whole code will work on a typical PC and use all found CPU cores (this is handled out of the box by TensorFlow). The problem is called Mountain Car: A car is on a one-dimensional track, positioned between two mountains. The goal is to drive up the mountain on the right (reaching the flag). However, the car's engine is not strong enough to climb the mountain in a single pass. Therefore, the only way to succeed is to drive back and forth to build up momentum. This problem was chosen because it is simple enough to find a solution with reinforcement learning in minutes on a single CPU core.