Is Q-Learning Provably Efficient?

Jin, Chi, Allen-Zhu, Zeyuan, Bubeck, Sebastien, Jordan, Michael I.

Neural Information Processing Systems 

Model-free reinforcement learning (RL) algorithms directly parameterize and update value functions or policies, bypassing the modeling of the environment. They are typically simpler, more flexible to use, and thus more prevalent in modern deep RL than model-based approaches. However, empirical work has suggested that they require large numbers of samples to learn. The theoretical question of whether not model-free algorithms are in fact \emph{sample efficient} is one of the most fundamental questions in RL. The problem is unsolved even in the basic scenario with finitely many states and actions.