UnpackingRewardShaping

Neural Information Processing Systems 

Much of this work is based on upper confidence bound (UCB) principles and prescribes some kind of exploration bonus to prioritize exploration of rarely visited regions.