Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Neural Information Processing Systems 

We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) under the tabular setting. We propose a model-based algorithm named UCBVI-γ, which is based on the optimism in the face of uncertainty principle and the Bernstein-type bonus.