Meta-Gradient Reinforcement Learning

Zhongwen Xu, Hado P. van Hasselt, David Silver

Oct-7-2024, 07:01:06 GMT–Neural Information Processing Systems

The goal of reinforcement learning algorithms is to estimate and/or optimise the value function. However, unlike supervised learning, no teacher or oracle is available to provide the true value function. Instead, the majority of reinforcement learning algorithms estimate and/or optimise a proxy for the value function. This proxy is typically based on a sampled and bootstrapped approximation to the true value function, known as a return. The particular choice of return is one of the chief components determining the nature of the algorithm: the rate at which future rewards are discounted; when and how values should be bootstrapped; or even the nature of the rewards themselves.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Oct-7-2024, 07:01:06 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada (0.46)

Industry:
- Leisure & Entertainment (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.94)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Meta-Gradient Reinforcement Learning
Meta-Gradient Reinforcement Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found