Reviews: Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies

Neural Information Processing Systems 

The paper introduces an RL problem where the agent is required to execute a given subtask graph which describes a set of subtasks and their dependency, and proposes a neural subtask graph solver (NSS) to solve this problem. In NSS, there are an observation module to capture the environment information using CNN, and a task module to encode the subtask graph using recursive-reverse-recursive neural network (R3NN). A non-parametric reward-propagation policy (RProp) is proposed to pre-train the NSS agent and further finetune it through actor-critic method. In general, the problem introduced in this paper is interesting and the method which uses CNN to capture the observation information and R3NN to encode the subtask graph is a good idea. Cons: 1. Writing: many details of the proposed method are included in the supplementary material which makes it difficult to understand by reading the main paper only.