Meta-Gradient Reinforcement Learning

Xu, Zhongwen, Hasselt, Hado P. van, Silver, David

Feb-14-2020, 10:41:23 GMT–Neural Information Processing Systems

The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. This proxy is typically based on a sampled and bootstrapped approximation to the true value function, known as a return. The particular choice of return is one of the chief components determining the nature of the algorithm: the rate at which future rewards are discounted; when and how values should be bootstrapped; or even the nature of the rewards themselves.

algorithm, meta-gradient reinforcement learning, value function, (2 more...)

Neural Information Processing Systems

Feb-14-2020, 10:41:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)