On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits

Yi, Jialin, Vojnović, Milan

arXiv.org Machine Learning 

Coordinating multiple agents that can communicate with each other to make decisions under uncertainty is a classical problem and has many different applications in computer science (Lynch, 1996), game theory (Chakravarty et al., 2014) and machine learning (Lanctot et al., 2017). We consider the multi-agent version of a multi-armed bandit problem which is one of the most fundamental decision making problems under uncertainty. In this problem, a learning agent needs to consider the exploration-exploitation trade-off, i.e. balancing the exploration of various actions in order to learn how much rewarding they are and selecting high-rewarding actions. In the multi-agent version of this problem, multiple agents collaborate with each other trying to maximize their individual cumulative rewards, and the challenge is to design efficient cooperative algorithms under communication constraints. We consider the nonstochastic (adversarial) multi-armed bandit problem in a cooperative multi-agent setting, with K 2 arms and N 1 agents.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found