AITopics | per-state uncertainty estimate

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Neural Information Processing SystemsDec-25-2025, 15:51:13 GMT

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local approximation errors. We address this by proposing an algorithm that adaptively switches between TD and MC in each state, thus mitigating the propagation of errors. Our method is based on learned confidence intervals that detect biases of TD estimates. We demonstrate in a variety of policy evaluation tasks that this simple adaptive algorithm performs competitively with the best approach in hindsight, suggesting that learned confidence intervals are a powerful technique for adapting policy evaluation to use TD or MC returns in a data-driven way.

adaptive temporal-difference learning, per-state uncertainty estimate, policy evaluation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Neural Information Processing SystemsMay-27-2025, 12:43:45 GMT

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local approximation errors. We address this by proposing an algorithm that adaptively switches between TD and MC in each state, thus mitigating the propagation of errors. Our method is based on learned confidence intervals that detect biases of TD estimates.

adaptive temporal-difference learning, per-state uncertainty estimate, policy evaluation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Neural Information Processing SystemsJan-25-2025, 04:18:04 GMT

The authors propose a novel method for adaptively using either the MC method for policy evaluation or the temporal difference method. The authors aim to solve the problem of balancing bias and variance in the reinforcement learning setting and to this end propose the Adaptive TD algorithm. The algorithm takes as input a set of sample episodes which it uses to bootstrap some confidence intervals for the value function of each state. It then compares the TD estimate for each of these states with these confidence intervals and keeps the TD estimate if it fits inside, otherwise, it picks the middle of the confidence interval as it assumes the TD estimate is essentially biased and inaccurate. The process repeats for a number of epochs (since the TD estimates change as the value function estimate for the future state is updated by the adaptive-TD rule). I think this paper shows promise: the method is, to my knowledge, original and from the numerical experiments seems to achieve the target the authors set for it - dominating TD and MC in the worst case.

algorithm, confidence interval, td estimate, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Neural Information Processing SystemsJan-25-2025, 04:17:54 GMT

The argumentation defending the proposed approach, and the numerical evaluation of its performance on realistic examples, are convincing. Despite the fact that the reviewers finally agree on the fact that NeurIPS might not be the best venue for this work, because of the quasi-absence of a theoretical part, I recommend to give it a chance it for the quality of the other dimensions of this work. If the paper is finally rejected, I recommend to the authors to follow the suggestions of the reviews, and to either re-submit to a more speciallized conference, or to consider a theoretical analysis (which can be expected to be rather involved).

artificial intelligence, machine learning, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Neural Information Processing SystemsOct-10-2024, 09:41:40 GMT

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local approximation errors. We address this by proposing an algorithm that adaptively switches between TD and MC in each state, thus mitigating the propagation of errors. Our method is based on learned confidence intervals that detect biases of TD estimates.

adaptive temporal-difference learning, per-state uncertainty estimate, policy evaluation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Riquelme, Carlos, Penedones, Hugo, Vincent, Damien, Maennel, Hartmut, Gelly, Sylvain, Mann, Timothy A., Barreto, Andre, Neu, Gergely

Neural Information Processing SystemsMar-19-2020, 01:31:23 GMT

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local approximation errors. We address this by proposing an algorithm that adaptively switches between TD and MC in each state, thus mitigating the propagation of errors. Our method is based on learned confidence intervals that detect biases of TD estimates.

adaptive temporal-difference learning, per-state uncertainty estimate, policy evaluation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

per-state uncertainty estimate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Reviews: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Reviews: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates