TightRegretBoundsforModel-Based Reinforcement LearningwithGreedyPolicies

Neural Information Processing Systems 

The results are based on anovelanalysis ofreal-time dynamic programming, thenextended tomodel-based RL.Specifically,wegeneralize existing algorithms that perform full-planning to act by 1-step planning.