Robust Batched Bandits

Guo, Yunwen, Shu, Yunlun, Zhuo, Gongyi, Wang, Tianyu

Oct-7-2025–arXiv.org Machine Learning

The batched multi-armed bandit (MAB) problem, in which rewards are collected in batches, is crucial for applications such as clinical trials. Existing research predominantly assumes light-tailed reward distributions, yet many real-world scenarios, including clinical outcomes, exhibit heavy-tailed characteristics. This paper bridges this gap by proposing robust batched bandit algorithms designed for heavy-tailed rewards, within both finite-arm and Lipschitz-continuous settings. We reveal a surprising phenomenon: in the instance-independent regime, as well as in the Lipschitz setting, heavier-tailed rewards necessitate a smaller number of batches to achieve near-optimal regret. In stark contrast, for the instance-dependent setting, the required number of batches to attain near-optimal regret remains invariant with respect to tail heaviness.

algorithm, bandit problem, batch, (15 more...)

arXiv.org Machine Learning

Oct-7-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.88)