A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning