Distributed Thompson Sampling

Dong, Jing, Li, Tan, Ren, Shaolei, Song, Linqi

Dec-3-2020–arXiv.org Artificial Intelligence

We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.

agent, stage 1, thompson sampling, (13 more...)

arXiv.org Artificial Intelligence

Dec-3-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Michigan (0.04)
  - California > Riverside County
    - Riverside (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > China
  - Hong Kong (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.69)
  - Artificial Intelligence > Representation & Reasoning
    - Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found