Reviews: Is Q-Learning Provably Efficient?

May-26-2025, 10:08:51 GMT–Neural Information Processing Systems

This paper studies the problem of efficient exploration in finite episodic MDPs. They present a variant of optimistic initialization tuned learning rates for Q-learning that recover a UCB-style algorithm. The main contribution of this work is a polynomial regret bound for perhaps one of the most iconic "model-free" algorithms. There are several things to like about this paper: - Q-learning is perhaps the classic intro to RL algorithms, so it's nice to see that we can recover sample efficient guarantees for a variant of this algorithm. The computational time is also particularly appealing compared to existing model-free algorithms with sqrt{T} *expected* (Bayesian) regret (such as RLSVI), which have much higher computational and memory requirements.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

May-26-2025, 10:08:51 GMT

Conferences Web Page

Add feedback

Genre:
- Overview (0.35)
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)