Reviews: Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards
–Neural Information Processing Systems
This is an interesting approach and seems novel in the context of options, although it looks to have some similarities to potential based reward shaping, e.g. (Devlin and Kudenko, 2012). The main advantages claimed for HAAR are (loosely) those of improved performance under sparse rewards and the learning of skills appropriate for transfer. These claims could be made more explicit, and that might help to justify the experimental section. The authors define advantage as: A_h(s_t h,a_t h) E[r_t h \gamma_h V_h(s_{t k} h) - V_h(s_{t} h)] The meaning of this is a little ambiguous and I would prefer this to be clarified.
Neural Information Processing Systems
Jan-25-2025, 04:35:27 GMT
- Technology: