Finite-Sample Analysis of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

Chen, Suei-Wen, Ross, Keith, Youssef, Pierre

Oct-3-2024–arXiv.org Artificial Intelligence

Monte Carlo Exploring Starts (MCES), which aims to learn the optimal policy using only sample returns, is a simple and natural algorithm in reinforcement learning which has been shown to converge under various conditions. However, the convergence rate analysis for MCES-style algorithms in the form of sample complexity has received very little attention. In this paper we develop a finite sample bound for a modified MCES algorithm which solves the stochastic shortest path problem. To this end, we prove a novel result on the convergence rate of the policy iteration algorithm. This result implies that with probability at least $1-\delta$, the algorithm returns an optimal policy after $\tilde{O}(SAK^3\log^3\frac{1}{\delta})$ sampled episodes, where $S$ and $A$ denote the number of states and actions respectively, $K$ is a proxy for episode length, and $\tilde{O}$ hides logarithmic factors and constants depending on the rewards of the environment that are assumed to be known.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Oct-3-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Massachusetts > Middlesex County
    - Belmont (0.04)
- Europe > France
  - Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
- Asia > Middle East
  - UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found