Action-Gap Phenomenon in Reinforcement Learning School of Computer Science, McGill University Montreal, Quebec, Canada

Neural Information Processing Systems 

Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity.