Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning

Moradipari, Ahmadreza, Pedramfar, Mohammad, Zini, Modjtaba Shokrian, Aggarwal, Vaneet

Oct-30-2023–arXiv.org Machine Learning

In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous reinforcement learning problem where $H$ is the episode length and $d_{l_1}$ is the Kolmogorov $l_1-$dimension of the space of environments. We then find concrete bounds of $d_{l_1}$ in a variety of settings, such as tabular, linear and finite mixtures, and discuss how how our results are either the first of their kind or improve the state-of-the-art.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

Oct-30-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Indiana > Tippecanoe County
    - West Lafayette (0.04)
    - Lafayette (0.04)
  - California > Santa Clara County
    - Mountain View (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Education > Focused Education > Special Education (0.44)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found