Goto

Collaborating Authors

 Europe











ABest-of-both-worldsAlgorithmforBanditswith DelayedFeedbackwithRobustnesstoExcessiveDelays

Neural Information Processing Systems

Joulani et al. (2013) have studied multi-armed bandits with delayed feedback under the assumption that the rewards are stochastic and the delays are sampled from a fixed distribution.