Self-PacedDeepReinforcementLearning
–Neural Information Processing Systems
Recently,anincreasing number ofalgorithms for curriculum generation havebeen proposed, empirically demonstrating that CL is an appropriate tool to improve the sample efficiency of DRL algorithms [9, 10]. However, these algorithms are based on heuristics and concepts that are, as ofnow,theoretically notwell understood, preventing theestablishment ofrigorous improvements. In contrast, we propose to generate the curriculum based on a principled inference view on RL. Our approach generates the curriculum based on two quantities: The value function of the agent and the KL divergence to a target distribution of tasks.
Neural Information Processing Systems
Feb-8-2026, 18:04:50 GMT
- Country:
- Europe
- Finland (0.04)
- Germany > Hesse
- Darmstadt Region > Darmstadt (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > Canada
- Europe
- Technology: