Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

Jan-17-2025, 12:52:07 GMT–Neural Information Processing Systems

In reinforcement learning, experience replay stores past samples for further reuse. Prioritized sampling is a promising technique to better utilize these samples. Previous criteria of prioritization include TD error, recentness and corrective feedback, which are mostly heuristically designed. In this work, we start from the regret minimization objective, and obtain an optimal prioritization strategy for Bellman update that can directly maximize the return of the policy. The theory suggests that data with higher hindsight TD error, better on-policiness and more accurate Q value should be assigned with higher weights during sampling.

criteria, off-policy reinforcement learning, regret minimization experience replay, (1 more...)

Neural Information Processing Systems

Jan-17-2025, 12:52:07 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)