Hybrid Reward Architecture for Reinforcement Learning
Seijen, Harm Van, Fatemi, Mehdi, Romoff, Joshua, Laroche, Romain, Barnes, Tavian, Tsang, Jeffrey
–Neural Information Processing Systems
One of the main challenges in reinforcement learning (RL) is generalisation. In typical deep RL methods this is achieved by approximating the optimal value function with a low-dimensional representation using a deep network. While this approach works well in many domains, in domains where the optimal value function cannot easily be reduced to a low-dimensional representation, learning can be very slow and unstable. This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function.
Neural Information Processing Systems
Feb-14-2020, 17:26:16 GMT