Distributed Thompson Sampling
Dong, Jing, Li, Tan, Ren, Shaolei, Song, Linqi
–arXiv.org Artificial Intelligence
We study a cooperative multi-agent multi-armed bandits with M agents and K arms. The goal of the agents is to minimized the cumulative regret. We adapt a traditional Thompson Sampling algoirthm under the distributed setting. However, with agent's ability to communicate, we note that communication may further reduce the upper bound of the regret for a distributed Thompson Sampling approach. To further improve the performance of distributed Thompson Sampling, we propose a distributed Elimination based Thompson Sampling algorithm that allow the agents to learn collaboratively. We analyse the algorithm under Bernoulli reward and derived a problem dependent upper bound on the cumulative regret.
arXiv.org Artificial Intelligence
Dec-3-2020
- Country:
- Asia > China
- Hong Kong (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- California > Riverside County
- Riverside (0.04)
- Michigan (0.04)
- California > Riverside County
- Asia > China
- Genre:
- Research Report (0.50)
- Technology: