Reviews: Data-Efficient Hierarchical Reinforcement Learning

Neural Information Processing Systems 

Summary The authors present a heirarchical reinforcement learning approach which learns at two levels, a higher level agent that is learning to perform actions in the form of medium term goals (changes in the state variable) and a low level agent that is aiming to (and rewarded for) achieving these medium term goals by performing atomic level actions. The key contributions identified by the authors are that learning at both lower and higher level are off-policy and take advantage of recent developments in off-policy learning. The authors say that the more challenging aspect of this, is the off policy learning at the higher level, as the actions (sub-goals) chosen during early experience are not effectively met by the low level policy. Their solution is to instead replace (or augment) high level experience with synthetic high level actions (sub-goals) which would be more likely to have happened based on the current instantiation of the low level controller. An additional key feature is that the sub-goals, rather than given in terms of absolute (observed) states, are instead given in terms of relative states (deltas), and there is a mechanism to update this sub-goal appropirately as the low level controller advances.