Distributed Multi-Player Bandits - a Game of Thrones Approach
–Neural Information Processing Systems
We consider a multi-armed bandit game where N players compete for K arms for T turns. Each player has different expected rewards for the arms, and the instantaneous rewards are independent and identically distributed. Performance is measured using the expected sum of regrets, compared to the optimal assignment of arms to players. We assume that each player only knows her actions and the reward she received each turn. Players cannot observe the actions of other players, and no communication between players is possible.
Neural Information Processing Systems
Nov-20-2025, 19:49:05 GMT
- Country:
- North America
- Canada (0.04)
- United States > California
- Santa Clara County > Palo Alto (0.04)
- North America
- Industry:
- Leisure & Entertainment (0.83)
- Media > Television (0.41)
- Technology: