PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models
–arXiv.org Artificial Intelligence
In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.
arXiv.org Artificial Intelligence
Jul-6-2023
- Country:
- Asia > China
- Jiangsu Province > Nanjing (0.04)
- North America > United States
- New York (0.04)
- Asia > China
- Genre:
- Research Report > Experimental Study (0.46)