Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

Apr-25-2026, 20:00:13 GMT–Neural Information Processing Systems

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon H with S states, and A actions. The performance of an agent is measured by the regret after interacting with the environment for T episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of posterior sampling that only needs a number of posterior samples logarithmic in H, S, A, and T per state-action pair.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Apr-25-2026, 20:00:13 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > New York (0.28)

Genre:
- Research Report (0.47)
- Workflow (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
45e15bae91a6f213d45e203b8a29be48-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found