Goto

Collaborating Authors

 njt



Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

Neural Information Processing Systems

Motivated by the scenario of large-scale learning in distributed systems, this paper studies a scenario where M agents cooperate together to solve the same instance of a K-armed stochastic bandit problem. The agents have limited access to a local subset of arms and are asynchronous with different gaps between decision-making rounds. The goal is to find the global optimal arm, and agents are able to pull any arm; however, they can only observe the reward when the selected arm is local. The challenge is a tradeoff for agents between pulling a local arm with observable feedback or pulling external arms without feedback and relying on others' observations that occur at different rates. We propose AAE-LCB, a two-stage algorithm that prioritizes pulling local arms following an active arm elimination policy and switches to other arms only if all local arms are dominated by some external arms. We analyze the regret of AAE-LCBand show it matches the regret lower bound up to a small factor.


4a5876b450b45371f6cfe5047ac8cd45-Supplemental.pdf

Neural Information Processing Systems

In the following equation, we use the results inAppendix D.1 tocalculate the probability that there exists some arm whose mean value isaboveitsconfidence intervalofwidth