A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

Chen, Zaiwei, Maguluri, Siva Theja, Shakkottai, Sanjay, Shanmugam, Karthikeyan

Feb-2-2021–arXiv.org Machine Learning

Reinforcement Learning (RL) is a promising approach to solve sequential decision making problems in complex and stochastic systems [38]. Despite the empirical successes of RL [32, 33], the convergence properties of RL algorithms are not well understood. In particular, even in the basic tabular setting (i.e., without using function approximation), finite-sample convergence guarantees of many popular RL algorithms are in general not established. Most of the value-based RL algorithms can be viewed as Stochastic Approximation (SA) algorithms for solving suitable Bellman's equations. Due to the nature of sampling in RL, many such algorithms inevitably perform the so-called asynchronous update. That is, in each iteration, only a subset of the components of the vector-valued iterate is updated. Moreover, the components being updated are usually selected in a stochastic manner along a single trajectory based on an underlying Markov chain. Handling such asynchronous updates is one of the main challenges in analyzing the behavior of RL algorithms.

algorithm, assumption 2, convergence, (15 more...)

arXiv.org Machine Learning

Feb-2-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Texas > Travis County > Austin (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.83)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found