Explaining Reinforcement Learning: Active vs Passive

#artificialintelligence 

This post assumes that you are familiar with the basics of Reinforcement Learning(RL) and Markov Decision Processes, if not please refer to this previous post first. Let's consider a problem where the agent can be in various states and can choose an action from a set of actions. Such type of problems are called Sequential Decision Problems. The solution to an MDP is an optimal policy which refers to the choice of action for every state that maximizes overall cumulative reward. Thus, the transition model that represents an agent's environment(when the environment is known) and the optimal policy which decides what action the agent needs to perform in each state are required elements for training the agent learn a specific behavior.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found