Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits
Chawla, Ronshee, Vial, Daniel, Shakkottai, Sanjay, Srikant, R.
–arXiv.org Artificial Intelligence
The multi-armed bandit (MAB) problem is a paradigm for seque ntial decision-making under uncertainty, which involves allocating a resource to an action, i n order to obtain a reward. MABs address the tradeoff between exploration and exploitation while mak ing sequential decisions. Owing to their utility in large-scale distributed systems (such as inform ation retrieval [ 38 ], advertising [ 8 ], etc.), an extensive study has been conducted on multi-agent versio ns of the classical MAB in the last few years. In multi-agent MABs, there are multiple agents learn ing a bandit and communicating over a network. The goal is to design communication strategies whi ch allow efficient exploration of arms across agents, so that they can perform better than single ag ent MAB algorithms. There exist many versions of multi-agent MABs in the literat ure (see Section 1.2 for an overview). We propose a new collaborative setting where each of the N agents is learning one of M stochastic MABs (where each of the bandits have K arms and M < N) to minimize the group cumulative regret, i.e., the sum of individual cumulative regrets of al l the agents.
arXiv.org Artificial Intelligence
May-30-2023