Multi-armed Bandits with Compensation

May-26-2025, 08:08:55 GMT–Neural Information Processing Systems

We propose and study the known-compensation multi-armed bandit (KCMAB) problem, where a system controller offers a set of arms to many short-term players for T steps. In each step, one short-term player arrives at the system. Upon arrival, the player aims to select an arm with the current best average reward and receives a stochastic reward associated with the arm. In order to incentivize players to explore other arms, the controller provide proper payment compensations to players. The objective of the controller is to maximize the total reward collected by players while minimizing the total compensation.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

May-26-2025, 08:08:55 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada (0.14)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
Multi-armed Bandits with Compensation
Multi-armed Bandits with Compensation

Similar Docs Excel Report more

Title	Similarity	Source
None found