Reviews: Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards

Neural Information Processing Systems 

The paper presents HAAR - a hierarchical reinforcement learning approach that is based on the idea of using the advantage / temporal difference error of the high-level controler provide the reward signal for the lower layer. The reviewers judged this approach to be novel, and empirical results are promising. Analytical results provide improvement guarantees similar to a base algorithm like TRPO. Several areas for improvement were mentioned, and many of these were addressed in the rebuttal. For example, the reviewers were pleased to see the additional experiment showing performance from random skill initialization.