Reviews: Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards
–Neural Information Processing Systems
The paper presents HAAR - a hierarchical reinforcement learning approach that is based on the idea of using the advantage / temporal difference error of the high-level controler provide the reward signal for the lower layer. The reviewers judged this approach to be novel, and empirical results are promising. Analytical results provide improvement guarantees similar to a base algorithm like TRPO. Several areas for improvement were mentioned, and many of these were addressed in the rebuttal. For example, the reviewers were pleased to see the additional experiment showing performance from random skill initialization.
Neural Information Processing Systems
Jan-25-2025, 04:18:29 GMT
- Technology: