Reviews: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

Jan-26-2025, 21:33:36 GMT–Neural Information Processing Systems

Summary: This paper proposes a new algorithm that help stabilize off-policy Q-learning. The idea is to introduce approximate Bellman updates that are based on constraint actions sampled only from the support of the training data distribution. The paper shows the main source of instability is the boostrapping error. The boostrapping process might use actions that do not lie in the training data distribution. This work shows a way to mitigate this issue.

bootstrapping error reduction, constraint, off-policy q-learning, (8 more...)

Neural Information Processing Systems

Jan-26-2025, 21:33:36 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Performance Analysis > Accuracy (0.40)
  - Reinforcement Learning (1.00)