ABest-of-Both-WorldsAlgorithmforBanditswith DelayedFeedback
–Neural Information Processing Systems
We present a modified tuning of the algorithm of Zimmert and Seldin [2020] for adversarial multiarmed bandits with delayed feedback, which in addition to the minimax optimal adversarial regret guarantee shown by Zimmert and Seldin simultaneously achieves a near-optimal regret guarantee in the stochastic setting with fixed delays.
Neural Information Processing Systems
Feb-8-2026, 19:32:14 GMT
- Technology: