Global Rewards in Restless Multi-Armed Bandits

Raman, Naveen, Shi, Zheyuan Ryan, Fang, Fei

Jun-7-2024–arXiv.org Artificial Intelligence

Restless multi-armed bandits (RMAB) extend multi-armed bandits so pulling an arm impacts future states. Despite the success of RMABs, a key limiting assumption is the separability of rewards into a sum across arms. We address this deficiency by proposing restless-multi-armed bandit with global rewards (RMAB-G), a generalization of RMABs to global non-separable rewards. To solve RMAB-G, we develop the Linear- and Shapley-Whittle indices, which extend Whittle indices from RMABs to RMAB-Gs. We prove approximation bounds but also point out how these indices could fail when reward functions are highly non-linear. To overcome this, we propose two sets of adaptive policies: the first computes indices iteratively, and the second combines indices with Monte-Carlo Tree Search (MCTS). Empirically, we demonstrate that our proposed policies outperform baselines and index-based policies with synthetic data and real-world data from food rescue.

probability, rmab, transition, (16 more...)

arXiv.org Artificial Intelligence

Jun-7-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Health & Medicine (0.67)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (1.00)
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning > Search (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found