Reviews: Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

Oct-7-2024, 17:53:21 GMT–Neural Information Processing Systems

Posterior Sampling for Reinforcement Learning: Worst-Case Regret Bounds This paper presents a new algorithm for efficient exploration in Markov decision processes. This algorithm is an optimistic variant of posterior sampling, similar in flavour to BOSS. The authors prove new performance bounds for this approach in a minimax setting that are state of the art in this setting. There are a lot of things to like about this paper: - The paper is well written and clear overall. I would say that most of the key insights do come from the earlier "Gaussian-Dirichlet dominance" of Osband et al, but there are some significant extensions and results that may be of wider interest to the community.

algorithm, optimistic posterior, psrl, (9 more...)

Neural Information Processing Systems

Oct-7-2024, 17:53:21 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.36)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.63)