Not enough data to create a plot.
Try a different view from the menu above.
Reviews: Is Q-Learning Provably Efficient?
This paper studies the problem of efficient exploration in finite episodic MDPs. They present a variant of optimistic initialization tuned learning rates for Q-learning that recover a UCB-style algorithm. The main contribution of this work is a polynomial regret bound for perhaps one of the most iconic "model-free" algorithms. There are several things to like about this paper: - Q-learning is perhaps the classic intro to RL algorithms, so it's nice to see that we can recover sample efficient guarantees for a variant of this algorithm. The computational time is also particularly appealing compared to existing model-free algorithms with sqrt{T} *expected* (Bayesian) regret (such as RLSVI), which have much higher computational and memory requirements.