The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
Liu, Shuze, Chen, Shuhang, Zhang, Shangtong
–arXiv.org Artificial Intelligence
Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.
arXiv.org Artificial Intelligence
Feb-5-2024
- Country:
- North America
- Canada > Alberta (0.14)
- United States
- New York (0.04)
- Virginia > Albemarle County
- Charlottesville (0.04)
- Massachusetts > Middlesex County
- Belmont (0.04)
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.04)
- Oxfordshire > Oxford (0.04)
- England
- Asia > Japan
- Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
- North America
- Genre:
- Research Report (0.50)
- Technology: