Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Bubeck, Sébastien, Li, Yuanzhi, Peres, Yuval, Sellke, Mark

May-1-2019–arXiv.org Machine Learning

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $\sqrt{T}$-type regret guarantee for this problem, under the feedback model where collisions are announced to the colliding players. Such a bound was not known even for the simpler stochastic version. We also prove the first sublinear guarantee for the feedback model where collision information is not available, namely $T^{1-\frac{1}{2m}}$ where $m$ is the number of players.

artificial intelligence, big data, collision information, (17 more...)

arXiv.org Machine Learning

May-1-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology
  - Artificial Intelligence (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found