First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs 1,2 Department of Computer Science, University of British Columbia

Neural Information Processing Systems 

Standard reinforcement learning (RL) agents never intelligently explore like a human (i.e.