First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs 1,2 Department of Computer Science, University of British Columbia