Action-Gap Phenomenon in Reinforcement Learning
–Neural Information Processing Systems
Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity.
Neural Information Processing Systems
Feb-11-2025, 18:03:59 GMT