AITopics | exploration and uncertainty

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsDec-25-2025, 02:10:16 GMT

Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks. Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped DQN, on 36 of those.

exploration and uncertainty, successor uncertainty, temporal difference learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.82)

Add feedback

Reviews: Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsJan-22-2025, 01:18:28 GMT

This paper proposes using Bayesian linear regression to get a posterior over successor features as a way of representing uncertainty, from which they sample for exploration. I found the characterization of Randomised Policy Iteration to be strange, as it only seems to apply to UBE but not bootstrapped DQN, With bootstrapped DQN, each model in the ensemble is a value function pertaining to a different policy, thus there is no single reference policy. The ensemble is trying to represent a distribution of optimal value functions, rather than value functions for a single reference policy. Proposition 1: In the case of neural networks, and function approximation in general, it is very unlikely that we will get a factored distribution, so this claim does not seem applicable in general. In fact, in general there should be very high correlation between the q-values between nearby states. Is this claim a direct response to UBE? Also the analysis fixes the policy to consider the distribution of value functions, but this seems to not be how posterior sampling is normally considered, but rather only the way UBE considers it.

posterior, temporal difference learning, value function, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Reviews: Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsJan-22-2025, 01:18:18 GMT

From the discussion, the reviewers appreciated the precisions made in the rebuttal. They have indicated what they would like to see improved in a revised version, in particular a clearer presentation.

exploration and uncertainty, successor uncertainty, temporal difference learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Neural Information Processing SystemsOct-9-2024, 14:53:35 GMT

Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks.

exploration and uncertainty, successor uncertainty, temporal difference learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Janz, David, Hron, Jiri, Mazur, Przemysław, Hofmann, Katja, Hernández-Lobato, José Miguel, Tschiatschek, Sebastian

Neural Information Processing SystemsMar-18-2020, 22:16:57 GMT

Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks.

exploration and uncertainty, successor uncertainty, temporal difference learning, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Collaborating Authors

exploration and uncertainty

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Reviews: Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Reviews: Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning