Multi-Player Approaches for Dueling Bandits
Raveh, Or, Honda, Junya, Sugiyama, Masashi
In decision-making under uncertainty, multi-armed bandit (MAB) [4] problems are a key paradigm with applications in recommendation systems and online advertising. These problems entail balancing exploration-exploitation trade-offs, as an agent draws from a set of K arms with unknown reward distributions to maximize cumulative rewards or minimize regret over time. Two notable variations of MAB include the dueling-bandit problem and the cooperative multiplayer MAB problem. In the dueling-bandit scenario [36], feedback comes from pairwise comparisons between K arms, useful in situations like human-feedback driven tasks, including ranker evaluation [25] and preference-based recommendation systems [10]. Meanwhile, the cooperative multiplayer MAB focuses on a group of M players collaboratively solving challenges in a distributed decisionmaking environment, enhancing learning through shared information. This approach finds applications in fields like multi-robot systems [19] and distributed recommender systems [27]. The M-player K-arm cooperative dueling bandit problem, combining aspects of the two previously studied variations, introduces a new dimension to cooperative decision-making with preference-based feedback, yet remains unexplored to the best of our knowledge.
May-25-2024
- Genre:
- Research Report (0.64)
- Industry:
- Consumer Products & Services > Restaurants (0.46)
- Energy > Oil & Gas
- Upstream (0.34)
- Technology: