Reviews: On Learning Intrinsic Rewards for Policy Gradient Methods
–Neural Information Processing Systems
This work attempts to use learnable intrinsic rewards in addition to conventional extrinsic reward from the environment to boost the agent performance calculated as conventional returns as the cumulative extrinsic rewards. This work can be seen as a variant of previous reward shaping and auxiliary rewards works but cannot be a more general version because though with more general mathematical form, it loses the consideration of domain knowledges, the key insights of previous works. Compared with its closest related works [Sorg et al. 2010, Guo et al. 2016], the method proposed here can be used in bootstrapping learning agents rather than only planning agents (i.e., Monte Carlo sampling of returns). This work implements an intuitively interesting idea that automatically shaping the rewards to boost learning performance. And compared with closely related previous works, it is more general to be used in modern, well performed learning methods (e.g., A2C and PPO). 2. The presentation is clear and easy to follow.
Neural Information Processing Systems
Oct-7-2024, 10:26:11 GMT
- Technology: