On Learning Intrinsic Rewards for Policy Gradient Methods

Zheng, Zeyu, Oh, Junhyuk, Singh, Satinder

Feb-14-2020, 15:25:57 GMT–Neural Information Processing Systems

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem.

intrinsic reward function, learning intrinsic reward, reward function, (4 more...)

Neural Information Processing Systems

Feb-14-2020, 15:25:57 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)