Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leiblier Maillard Sampling
Qin, Hao, Jun, Kwang-Sung, Zhang, Chicheng
–arXiv.org Artificial Intelligence
We study the problem of Multi-Armed Bandits (MAB) with reward distributions belonging to a One-Parameter Exponential Distribution (OPED) family. In the literature, several criteria have been proposed to evaluate the performance of such algorithms, including Asymptotic Optimality (A.O.), Minimax Optimality (M.O.), Sub-UCB, and variance-adaptive worst-case regret bound. Thompson Sampling (TS)-based and Upper Confidence Bound (UCB)-based algorithms have been employed to achieve some of these criteria. However, none of these algorithms simultaneously satisfy all the aforementioned criteria.
arXiv.org Artificial Intelligence
Feb-20-2025
- Country:
- Oceania > Australia
- North America > United States
- Arizona (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Genre:
- Research Report (0.82)
- Industry:
- Technology: