Review for NeurIPS paper: Self-Paced Deep Reinforcement Learning
–Neural Information Processing Systems
Summary and Contributions: After reading the authors response, I've updated my score from (4) to (5). A fixed set of curriculum tasks is given, and the algorithm can sample tasks from the set at every step. The hope is that by smartly and adaptively selecting the tasks, we can speed up learning. The final goal is to maximize performance with respect to a fixed target distribution over tasks (which is known). The proposed algorithm alternates two types of steps: policy improving for a fixed task (or "context") distribution, and "task distribution adjustment" for a fixed policy.
Neural Information Processing Systems
Jan-25-2025, 06:54:32 GMT
- Technology: