Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
Kearns, Michael J., Singh, Satinder P.
–Neural Information Processing Systems
In this paper, we address two issues of longstanding interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning after only a finite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for off-line value iteration? We first show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the number of state transitions observed.
Neural Information Processing Systems
Dec-31-1999