Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

Neural Information Processing Systems 

State-of-the-art efficient model-based Reinforcement Learning (RL) algorithms typically act by iteratively solving empirical models, i.e., by performing full-planning