Strategic Arms with Side Communication Prevail Over Low-Regret MAB Algorithms

Yahmed, Ahmed Ben, Calauzènes, Clément, Perchet, Vianney

arXiv.org Artificial Intelligence 

It significantly extends the standard MAB problem, as arms can utilize In the strategic multi-armed bandit setting, when arms possess this reporting mechanism to influence the player's decisions. For perfect information about the player's behavior, they can establish example, arms may opt to report higher values initially to increase an equilibrium where: 1. they retain almost all of their value, 2. their chances of being selected in later rounds. Conversely, they they leave the player with a substantial (linear) regret. This study may report lower values at the outset to decrease the reserve price illustrates that, even if complete information is not publicly available in auctions [3]. Furthermore, our study takes into consideration the to all arms but is shared among them, it is possible to achieve existence of side communications among arms, governed by predefined a similar equilibrium. The primary challenge lies in designing a rules. This consideration is motivated by real-world scenarios communication protocol that incentivizes the arms to communicate in which such interactions are prevalent and influential.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found