Introduction to Reinforcement Learning (RL) -- Part 4 -- "Dynamic Programming"

Nov-20-2020, 22:40:19 GMT–#artificialintelligence

Starting in this chapter, the assumption is that the environment is a finite Markov Decision Process (finite MDP). In this chapter we'll see how we can use DP algorithms to compute the value functions in a slightly different, less intractable way. The general idea is to take these 2 equations, and turn them into update rules for for improving the approximations of our value functions. It will make more sense later on. Policy Evaluation Policy evaluation means computing the state-value function Vπ for an arbitrary policy π.

dynamic programming, policy improvement, value function, (13 more...)

#artificialintelligence

Nov-20-2020, 22:40:19 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.44)
  - Machine Learning
    - Reinforcement Learning (0.40)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.56)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found