Strategic Arms with Side Communication Prevail Over Low-Regret MAB Algorithms
Yahmed, Ahmed Ben, Calauzènes, Clément, Perchet, Vianney
–arXiv.org Artificial Intelligence
It significantly extends the standard MAB problem, as arms can utilize In the strategic multi-armed bandit setting, when arms possess this reporting mechanism to influence the player's decisions. For perfect information about the player's behavior, they can establish example, arms may opt to report higher values initially to increase an equilibrium where: 1. they retain almost all of their value, 2. their chances of being selected in later rounds. Conversely, they they leave the player with a substantial (linear) regret. This study may report lower values at the outset to decrease the reserve price illustrates that, even if complete information is not publicly available in auctions [3]. Furthermore, our study takes into consideration the to all arms but is shared among them, it is possible to achieve existence of side communications among arms, governed by predefined a similar equilibrium. The primary challenge lies in designing a rules. This consideration is motivated by real-world scenarios communication protocol that incentivizes the arms to communicate in which such interactions are prevalent and influential.
arXiv.org Artificial Intelligence
Aug-30-2024
- Country:
- North America > United States
- New Jersey > Middlesex County > Piscataway (0.04)
- Europe > France
- Île-de-France > Paris > Paris (0.04)
- Asia > South Korea
- North America > United States
- Genre:
- Research Report (0.40)
- Technology: