Meta-Gradient Reinforcement Learning
Xu, Zhongwen, Hasselt, Hado P. van, Silver, David
–Neural Information Processing Systems
The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. This proxy is typically based on a sampled and bootstrapped approximation to the true value function, known as a return. The particular choice of return is one of the chief components determining the nature of the algorithm: the rate at which future rewards are discounted; when and how values should be bootstrapped; or even the nature of the rewards themselves.
Neural Information Processing Systems
Feb-14-2020, 10:41:23 GMT
- Technology: