Decentralized Cooperative Stochastic Multi-armed Bandits

Martínez-Rubio, David, Kanade, Varun, Rebeschini, Patrick

Oct-10-2018–arXiv.org Machine Learning

We study a decentralized cooperative stochastic multi-armed bandit problem with $K$ arms on a network of $N$ agents. In our model, the reward distribution of each arm is agent-independent. Each agent chooses iteratively one arm to play and then communicates to her neighbors. The aim is to minimize the total network regret. We design a fully decentralized algorithm that uses a running consensus procedure to compute, with some delay, accurate estimations of the average of rewards obtained by all the agents for each arm, and then uses an upper confidence bound algorithm that accounts for the delay and error of the estimations. We analyze the algorithm and up to a constant our regret bounds are better for all networks than other algorithms designed to solve the same problem. For some graphs, our regret bounds are significantly better.

algorithm, big data, health & medicine, (21 more...)

arXiv.org Machine Learning

Oct-10-2018

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England (0.14)

Genre:
- Research Report (0.50)

Industry:
- Health & Medicine (0.30)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (0.90)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found