Global Rewards in Restless Multi-Armed Bandits
–Neural Information Processing Systems
We prove approximation bounds but also point out how these indices could fail when reward functions are highly non-linear. To overcome this, we propose two sets of adaptive policies: the first computes indices iteratively, and the second combines indices with Monte-Carlo Tree Search (MCTS). Empirically, we demonstrate that our proposed policies outperform baselines and index-based policies with synthetic data and real-world data from food rescue.
Neural Information Processing Systems
May-28-2025, 22:11:31 GMT
- Country:
- North America > United States (0.14)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.92)
- Research Report
- Industry:
- Health & Medicine > Public Health (0.46)
- Technology: