Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Oct-2-2025, 03:01:38 GMT–Neural Information Processing Systems

Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (P AC guarantee).

probability, reinforcement learning, sample complexity, (12 more...)

Neural Information Processing Systems

Oct-2-2025, 03:01:38 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.67)