Review for NeurIPS paper: Self-Paced Deep Reinforcement Learning

Neural Information Processing Systems 

Summary and Contributions: After reading the authors response, I've updated my score from (4) to (5). A fixed set of curriculum tasks is given, and the algorithm can sample tasks from the set at every step. The hope is that by smartly and adaptively selecting the tasks, we can speed up learning. The final goal is to maximize performance with respect to a fixed target distribution over tasks (which is known). The proposed algorithm alternates two types of steps: policy improving for a fixed task (or "context") distribution, and "task distribution adjustment" for a fixed policy.