Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

Jan-17-2025, 08:09:03 GMT–Neural Information Processing Systems

In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We present a refined analysis of the information ratio, and show an upper bound of order \widetilde{O}(H\sqrt{d_{l_1}T}) in the time inhomogeneous reinforcement learning problem where H is the episode length and d_{l_1} is the Kolmogorov l_1- dimension of the space of environments. We then find concrete bounds of d_{l_1} in a variety of settings, such as tabular, linear and finite mixtures, and discuss how our results improve the state-of-the-art.

improved bayesian regret bound, reinforcement learning, thompson sampling

Neural Information Processing Systems

Jan-17-2025, 08:09:03 GMT

Conferences Web Page

Add feedback

Industry:
- Education > Focused Education > Special Education (0.32)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)