Siwei Wang
Multi-armed Bandits with Compensation
Siwei Wang, Longbo Huang
We propose and study the known-compensation multi-armed bandit (KCMAB) problem, where a system controller offers a set of arms to many short-term players for T steps. In each step, one short-term player arrives at the system. Upon arrival, the player aims to select an arm with the current best average reward and receives a stochastic reward associated with the arm. In order to incentivize players to explore other arms, the controller provide proper payment compensations to players. The objective of the controller is to maximize the total reward collected by players while minimizing the total compensation.
Multi-armed Bandits with Compensation
Siwei Wang, Longbo Huang
We propose and study the known-compensation multi-armed bandit (KCMAB) problem, where a system controller offers a set of arms to many short-term players for T steps. In each step, one short-term player arrives at the system. Upon arrival, the player aims to select an arm with the current best average reward and receives a stochastic reward associated with the arm. In order to incentivize players to explore other arms, the controller provide proper payment compensations to players. The objective of the controller is to maximize the total reward collected by players while minimizing the total compensation.