Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Oct-9-2024, 18:30:11 GMT–Neural Information Processing Systems

Restless bandit problems are instances of non-stationary multi-armed bandits. These problems have been studied well from the optimization perspective, where the goal is to efficiently find a near-optimal policy when system parameters are known. However, very few papers adopt a learning perspective, where the parameters are unknown. In this paper, we analyze the performance of Thompson sampling in episodic restless bandits with unknown parameters. We consider a general policy map to define our competitor and prove an \tilde{\bigO}(\sqrt{T}) Bayesian regret bound.

episodic restless bandit problem, regret bound, thompson sampling

Neural Information Processing Systems

Oct-9-2024, 18:30:11 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (0.84)
  - Data Science > Data Mining
    - Big Data (1.00)