Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

Oct-3-2024, 04:20:16 GMT–Neural Information Processing Systems

We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter.

algorithm, posterior, reinforcement, (14 more...)

Neural Information Processing Systems

Oct-3-2024, 04:20:16 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom
  - England > Greater London > London (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.85)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.35)

Duplicate Docs Excel Report

Title
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

Similar Docs Excel Report more

Title	Similarity	Source
None found