Finite-Time Analysis of Temporal Difference Learning: Discrete-Time Linear System Perspective
–arXiv.org Artificial Intelligence
TD-learning is a fundamental algorithm in the field of reinforcement learning (RL), that is employed to evaluate a given policy by estimating the corresponding value function for a Markov decision process. While significant progress has been made in the theoretical analysis of TD-learning, recent research has uncovered guarantees concerning its statistical efficiency by developing finite-time error bounds. This paper aims to contribute to the existing body of knowledge by presenting a novel finite-time analysis of tabular temporal difference (TD) learning, which makes direct and effective use of discrete-time stochastic linear system models and leverages Schur matrix properties. The proposed analysis can cover both on-policy and off-policy settings in a unified manner. By adopting this approach, we hope to offer new and straightforward templates that not only shed further light on the analysis of TD-learning and related RL algorithms but also provide valuable insights for future research in this domain.
arXiv.org Artificial Intelligence
Jun-2-2023
- Country:
- Asia
- Middle East > Jordan (0.04)
- South Korea > Daejeon
- Daejeon (0.04)
- Europe > United Kingdom
- England
- Cambridgeshire > Cambridge (0.28)
- Oxfordshire > Oxford (0.04)
- England
- North America > United States
- Massachusetts > Middlesex County > Belmont (0.04)
- Asia
- Genre:
- Research Report (0.64)