Multi-armed Bandits with Compensation

Siwei Wang, Longbo Huang

Neural Information Processing Systems 

We propose and study the known-compensation multi-armed bandit (KCMAB) problem, where a system controller offers a set of arms to many short-term players for T steps. In each step, one short-term player arrives at the system. Upon arrival, the player aims to select an arm with the current best average reward and receives a stochastic reward associated with the arm. In order to incentivize players to explore other arms, the controller provide proper payment compensations to players. The objective of the controller is to maximize the total reward collected by players while minimizing the total compensation.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found