UnpackingRewardShaping

Neural Information Processing Systems 

Much of this work is based on upper confidence bound (UCB) principles and prescribes some kind of exploration bonus to prioritize exploration of rarely visited regions.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found