Reviews: Distributed Multi-Player Bandits - a Game of Thrones Approach

Neural Information Processing Systems 

This paper studies a distributed multi-armed bandits setting, where every round each of N players must choose one of K arms. If a player picks the same arm as some other player, they receive payoff zero; otherwise, they receive payoff drawn from some distribution (specific to that player and that arm). This can be thought of as learning a distributed allocation or matching from players to arms. The goal is to design a communication-free learning algorithm that maximizes the total overall utility of all the players (or alternatively, minimizes their regret with respect to the best fixed allocation). The authors design an algorithm which receives total regret of O(log 2 T).