Introduction to Reinforcement Learning (RL) -- Part 4 -- "Dynamic Programming"

#artificialintelligence 

Starting in this chapter, the assumption is that the environment is a finite Markov Decision Process (finite MDP). In this chapter we'll see how we can use DP algorithms to compute the value functions in a slightly different, less intractable way. The general idea is to take these 2 equations, and turn them into update rules for for improving the approximations of our value functions. It will make more sense later on. Policy Evaluation Policy evaluation means computing the state-value function Vπ for an arbitrary policy π.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found