Variance-Aware Feel-Good Thompson Sampling for Contextual Bandits

Jun-11-2026, 03:10:01 GMT–Neural Information Processing Systems

Variance-dependent regret bounds have received increasing attention in recent studies on contextual bandits. However, most of these studies are focused on upper confidence bound (UCB)-based bandit algorithms, while sampling based bandit algorithms such as Thompson sampling are still understudied. The only exception is the `LinVDTS` algorithm (Xu et al., 2023), which is limited to linear reward function and its regret bound is not optimal with respect to the model dimension. In this paper, we present `FGTSVA`, a variance-aware Thompson Sampling algorithm for contextual bandits with general reward function with optimal regret bound. At the core of our analysis is an extension of the decoupling coefficient, a technique commonly used in the analysis of Feel-good Thompson sampling (FGTS) that reflects the complexity of the model space.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Jun-11-2026, 03:10:01 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)