Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback

May-28-2025, 19:58:39 GMT–Neural Information Processing Systems

Motivated by the scenario of large-scale learning in distributed systems, this paper studies a scenario where M agents cooperate together to solve the same instance of a K-armed stochastic bandit problem. The agents have limited access to a local subset of arms and are asynchronous with different gaps between decision-making rounds. The goal is to find the global optimal arm, and agents are able to pull any arm; however, they can only observe the reward when the selected arm is local. The challenge is a tradeoff for agents between pulling a local arm with observable feedback or pulling external arms without feedback and relying on others' observations that occur at different rates. We propose AAE-LCB, a two-stage algorithm that prioritizes pulling local arms following an active arm elimination policy and switches to other arms only if all local arms are dominated by some external arms. We analyze the regret of AAE-LCB and show it matches the regret lower bound up to a small factor.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

May-28-2025, 19:58:39 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Massachusetts (0.14)

Genre:
- Research Report (0.48)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning > Agents (1.00)
  - Communications (1.00)
  - Data Science > Data Mining
    - Big Data (0.69)