Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms

Kearns, Michael J., Singh, Satinder P.

Neural Information Processing Systems 

In this paper, we address two issues of longstanding interest in the reinforcement learning literature. First, what kinds of performance guarantees can be made for Q-learning after only a finite number of actions? Second, what quantitative comparisons can be made between Q-learning and model-based (indirect) approaches, which use experience to estimate next-state distributions for off-line value iteration? We first show that both Q-learning and the indirect approach enjoy rather rapid convergence to the optimal policy as a function of the number of state transitions observed.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found