Multi-Player Approaches for Dueling Bandits

Raveh, Or, Honda, Junya, Sugiyama, Masashi

arXiv.org Machine Learning 

In decision-making under uncertainty, multi-armed bandit (MAB) [4] problems are a key paradigm with applications in recommendation systems and online advertising. These problems entail balancing exploration-exploitation trade-offs, as an agent draws from a set of K arms with unknown reward distributions to maximize cumulative rewards or minimize regret over time. Two notable variations of MAB include the dueling-bandit problem and the cooperative multiplayer MAB problem. In the dueling-bandit scenario [36], feedback comes from pairwise comparisons between K arms, useful in situations like human-feedback driven tasks, including ranker evaluation [25] and preference-based recommendation systems [10]. Meanwhile, the cooperative multiplayer MAB focuses on a group of M players collaboratively solving challenges in a distributed decisionmaking environment, enhancing learning through shared information. This approach finds applications in fields like multi-robot systems [19] and distributed recommender systems [27]. The M-player K-arm cooperative dueling bandit problem, combining aspects of the two previously studied variations, introduces a new dimension to cooperative decision-making with preference-based feedback, yet remains unexplored to the best of our knowledge.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found