On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits
Coordinating multiple agents that can communicate with each other to make decisions under uncertainty is a classical problem and has many different applications in computer science (Lynch, 1996), game theory (Chakravarty et al., 2014) and machine learning (Lanctot et al., 2017). We consider the multi-agent version of a multi-armed bandit problem which is one of the most fundamental decision making problems under uncertainty. In this problem, a learning agent needs to consider the exploration-exploitation trade-off, i.e. balancing the exploration of various actions in order to learn how much rewarding they are and selecting high-rewarding actions. In the multi-agent version of this problem, multiple agents collaborate with each other trying to maximize their individual cumulative rewards, and the challenge is to design efficient cooperative algorithms under communication constraints. We consider the nonstochastic (adversarial) multi-armed bandit problem in a cooperative multi-agent setting, with K 2 arms and N 1 agents.
Oct-21-2023
- Country:
- Europe
- Italy > Sicily
- Palermo (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Greater London > London (0.04)
- Italy > Sicily
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- California
- Los Angeles County > Long Beach (0.04)
- San Diego County > San Diego (0.04)
- Florida > Orange County
- Orlando (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- New Mexico > Los Alamos County
- Los Alamos (0.04)
- California
- Canada
- Europe
- Genre:
- Research Report > New Finding (0.93)
- Technology: