Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
This is the second post of the series, in which we will talk about a novel Hierarchical Reinforcement Learning built upon HIerarchical Reinforcement learning with Off-policy correction(HIRO) we discussed in the previous post. This post is comprised of two sections. In the first section, we first compared architectures of representation learning for HRL and HIRO; then we started from Claim 4 in the paper, seeing how to learn good representations that lead to bounded sub-optimality and how the intrinsic reward for the low-level policy is defined; we will provide the pseudocode for the algorithm at the end of this section. In section Discussion, we will bring some insight into the algorithm and connect the low-level policy to the probabilistic graphical model to build some intuition. Different from HIRO, in which goals serve as a measure of dissimilarity between the current state and the desired state, goals here are used to directly produce a lower-level policy in conjunction with the current state.
Sep-5-2019, 11:48:32 GMT
- Technology: