Multi-Agent Multi-Armed Bandits with Limited Communication
Agarwal, Mridul, Aggarwal, Vaneet, Azizzadenesheli, Kamyar
–arXiv.org Artificial Intelligence
We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best ( (K/N) arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of Õ N)T, communicates for O(logT) steps and broadcasts O(logK) bits in each communication step. Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithm perform well and outperform strategies that communicate through a central node. We consider a setup where N agents connected over a network, interact with a multi armed bandit (MAB) environment (Lattimore and Szepesvári, 2020). The agents aim to collaborate with other agents in the network to minimize their regret. The agents also aim to reduce the number of messages and the size of messages communicated with others.
arXiv.org Artificial Intelligence
Feb-10-2021
- Country:
- North America > United States
- Indiana > Tippecanoe County
- West Lafayette (0.04)
- Lafayette (0.04)
- Indiana > Tippecanoe County
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Industry:
- Health & Medicine (1.00)
- Technology: