DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning

Neural Information Processing Systems 

Sparse-reward reinforcement learning (RL) can model a wide range of highly complex tasks. Solving sparse-reward tasks is RL's core premise -- requiring efficient exploration coupled with long-horizon credit assignment -- and overcoming these challenges is key for building self-improving agents with superhuman ability. We argue that solving complex and high-dimensional tasks requires solving simpler tasks that are to the target task. In contrast, most prior work designs strategies for selecting exploratory tasks with the objective of solving task, making exploration of challenging high-dimensional, long-horizon tasks intractable. We find that the sense of direction, necessary for effective exploration, can be extracted from existing reinforcement learning algorithms, without needing any prior information.