Multi-Agent Multi-Armed Bandits with Limited Communication

Agarwal, Mridul, Aggarwal, Vaneet, Azizzadenesheli, Kamyar

Feb-10-2021–arXiv.org Artificial Intelligence

We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best ( (K/N) arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of Õ N)T, communicates for O(logT) steps and broadcasts O(logK) bits in each communication step. Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithm perform well and outperform strategies that communicate through a central node. We consider a setup where N agents connected over a network, interact with a multi armed bandit (MAB) environment (Lattimore and Szepesvári, 2020). The agents aim to collaborate with other agents in the network to minimize their regret. The agents also aim to reduce the number of messages and the size of messages communicated with others.

agent, algorithm, epoch, (13 more...)

arXiv.org Artificial Intelligence

Feb-10-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Indiana > Tippecanoe County
    - West Lafayette (0.04)
    - Lafayette (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (1.00)
  - Artificial Intelligence > Representation & Reasoning
    - Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found