Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning
Grudic, Gregory Z., Ungar, Lyle H.
–Neural Information Processing Systems
We address two open theoretical questions in Policy Gradient Reinforcement Learning. The first concerns the efficacy of using function approximation to represent the state action value function,.
Neural Information Processing Systems
Dec-31-2002
- Country:
- North America > United States
- Pennsylvania (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.05)
- Colorado > Boulder County
- Boulder (0.04)
- California > San Mateo County
- Menlo Park (0.04)
- North America > United States
- Technology: