Self-PacedDeepReinforcementLearning

Neural Information Processing Systems 

Recently,anincreasing number ofalgorithms for curriculum generation havebeen proposed, empirically demonstrating that CL is an appropriate tool to improve the sample efficiency of DRL algorithms [9, 10]. However, these algorithms are based on heuristics and concepts that are, as ofnow,theoretically notwell understood, preventing theestablishment ofrigorous improvements. In contrast, we propose to generate the curriculum based on a principled inference view on RL. Our approach generates the curriculum based on two quantities: The value function of the agent and the KL divergence to a target distribution of tasks.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found