Diversify \& Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement

Neural Information Processing Systems 

Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards.