Fast Policy Learning through Imitation and Reinforcement