Classical Policy Gradient: Preserving Bellman's Principle of Optimality

Thomas, Philip S., Jordan, Scott M., Chandak, Yash, Nota, Chris, Kostas, James

Jun-6-2019–arXiv.org Machine Learning

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

Jun-6-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts (0.29)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found