UnpackingRewardShaping
–Neural Information Processing Systems
Much of this work is based on upper confidence bound (UCB) principles and prescribes some kind of exploration bonus to prioritize exploration of rarely visited regions.
Neural Information Processing Systems
Feb-9-2026, 09:55:56 GMT
- Country:
- Asia
- Japan > Honshū
- Kansai > Osaka Prefecture > Osaka (0.04)
- Middle East > Jordan (0.04)
- Japan > Honshū
- North America > United States
- Washington > King County > Seattle (0.04)
- Oceania > Australia
- Queensland > Brisbane (0.04)
- Asia
- Technology: