Classical Policy Gradient: Preserving Bellman's Principle of Optimality
Thomas, Philip S., Jordan, Scott M., Chandak, Yash, Nota, Chris, Kostas, James
We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.
Jun-6-2019
- Country:
- North America > United States
- Massachusetts (0.29)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America > United States
- Genre:
- Research Report (0.40)
- Technology: