Action-Gap Phenomenon in Reinforcement Learning

Neural Information Processing Systems 

Even if we don't know the exact quality (value) of each choice (action) vs. Not a big deal if we choose the wrong one!