Multi-Agent Multi-Armed Bandits with Limited Communication

Agarwal, Mridul, Aggarwal, Vaneet, Azizzadenesheli, Kamyar

arXiv.org Artificial Intelligence 

We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best ( (K/N) arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of Õ N)T, communicates for O(logT) steps and broadcasts O(logK) bits in each communication step. Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithm perform well and outperform strategies that communicate through a central node. We consider a setup where N agents connected over a network, interact with a multi armed bandit (MAB) environment (Lattimore and Szepesvári, 2020). The agents aim to collaborate with other agents in the network to minimize their regret. The agents also aim to reduce the number of messages and the size of messages communicated with others.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found