Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Christoph Dann, Tor Lattimore, Emma Brunskill

Oct-2-2024, 20:18:46 GMT–Neural Information Processing Systems

Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

algorithm, pac, probability, (14 more...)

Neural Information Processing Systems

Oct-2-2024, 20:18:46 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - California
    - Santa Clara County > Palo Alto (0.04)
    - Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom
  - England
    - Oxfordshire > Oxford (0.04)
    - Cambridgeshire > Cambridge (0.04)

Industry:
- Health & Medicine (0.66)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.49)

Duplicate Docs Excel Report

Title
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found