Finite-Sample Analysis for SARSA with Linear Function Approximation

Zou, Shaofeng, Xu, Tengyu, Liang, Yingbin

Mar-19-2020, 00:03:45 GMT–Neural Information Processing Systems

SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement learning. We investigate the SARSA algorithm with linear function approximation under the non-i.i.d.\ setting, where a single sample trajectory is available. With a Lipschitz continuous policy improvement operator that is smooth enough, SARSA has been shown to converge asymptotically. However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels.

algorithm, linear function approximation, sarsa algorithm, (2 more...)

Neural Information Processing Systems

Mar-19-2020, 00:03:45 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.42)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.64)