r/MachineLearning - [R] Discounted Reinforcement Learning Is Not an Optimization Problem
If one policy has greater or equal value than the other, in all states, we might say the policy is better. The policy gradient paper guarantees that locally optimal policies can be found with function approximation. This functional returns either the long-term avg rewards or discounted cumulative rewards from a designated start state. In practice, one would obtain an unbiased estimator for the gradient of this functional w.r.t.
Dec-18-2019, 11:54:09 GMT
- Technology: