A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants
Chen, Zaiwei, Maguluri, Siva Theja, Shakkottai, Sanjay, Shanmugam, Karthikeyan
Reinforcement Learning (RL) is a promising approach to solve sequential decision making problems in complex and stochastic systems [38]. Despite the empirical successes of RL [32, 33], the convergence properties of RL algorithms are not well understood. In particular, even in the basic tabular setting (i.e., without using function approximation), finite-sample convergence guarantees of many popular RL algorithms are in general not established. Most of the value-based RL algorithms can be viewed as Stochastic Approximation (SA) algorithms for solving suitable Bellman's equations. Due to the nature of sampling in RL, many such algorithms inevitably perform the so-called asynchronous update. That is, in each iteration, only a subset of the components of the vector-valued iterate is updated. Moreover, the components being updated are usually selected in a stochastic manner along a single trajectory based on an underlying Markov chain. Handling such asynchronous updates is one of the main challenges in analyzing the behavior of RL algorithms.
Feb-2-2021
- Country:
- North America > United States
- Texas > Travis County > Austin (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.83)