UnpackingRewardShaping
–Neural Information Processing Systems
Much of this work is based on upper confidence bound (UCB) principles and prescribes some kind of exploration bonus to prioritize exploration of rarely visited regions.
Neural Information Processing Systems
Feb-9-2026, 09:55:56 GMT
- Country:
- Oceania > Australia
- Queensland > Brisbane (0.04)
- North America > United States
- Washington > King County > Seattle (0.04)
- Asia
- Middle East > Jordan (0.04)
- Japan > Honshū
- Kansai > Osaka Prefecture > Osaka (0.04)
- Oceania > Australia
- Technology: