Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits
–Neural Information Processing Systems
We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback (CMAB). In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family.
Neural Information Processing Systems
May-29-2025, 01:08:30 GMT
- Country:
- North America > United States
- California (0.14)
- New York (0.14)
- North America > United States
- Genre:
- Research Report (0.46)
- Technology: