Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

Apr-27-2026, 01:40:14 GMT–Neural Information Processing Systems

In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order eO(H p dl1T) in the time inhomogeneous reinforcement learning problem where H is the episode length and dl1 is the Kolmogorov l1 dimension of the space of environments. We then find concrete bounds of dl1 in a variety of settings, such as tabular, linear and finite mixtures, and discuss how how our results are either the first of their kind or improve the state-of-the-art.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Apr-27-2026, 01:40:14 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.68)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Education > Focused Education > Special Education (0.45)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
4a17cd29ced0443bcff689fbb0d32d5e-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found