The Asymptotic Convergence-Rate of Q-learning

Szepesvári, Csaba

Neural Information Processing Systems 

Q-Iearning is a popular reinforcement learning (RL) algorithm whose convergence is well demonstrated in the literature (Jaakkola et al., 1994; Tsitsiklis, 1994; Littman and Szepesvari, 1996; Szepesvari and Littman, 1996). Our aim in this paper is to provide an upper bound for the convergence rate of (lookup-table based) Q-Iearning algorithms. Although, this upper bound is not strict, computer experiments (to be presented elsewhere) and the form of the lemma underlying the proof indicate that the obtained upper bound can be made strict by a slightly more complicated definition for R. Our results extend to learning on aggregated states (see (Singh et al., 1995» and other related algorithms which admit a certain form of asynchronous stochastic approximation (see (Szepesv iri and Littman, 1996». Present address: Associative Computing, Inc., Budapest, Konkoly Thege M. u. 29-33, HUNGARY-1121 The Asymptotic Convergence-Rate of Q-leaming

Similar Docs  Excel Report  more

TitleSimilaritySource
None found