Low-Complexity Algorithm for Restless Bandits with Imperfect Observations
Liu, Keqin, Weber, Richard, Wu, Ting, Zhang, Chengzhong
–arXiv.org Artificial Intelligence
We consider a class of restless bandit problems that finds a broad application area in stochastic optimization, reinforcement learning and operations research. We consider $N$ independent discrete-time Markov processes, each of which had two possible states: 1 and 0 (`good' and `bad'). Only if a process is both in state 1 and observed to be so does reward accrue. The aim is to maximize the expected discounted sum of returns over the infinite horizon subject to a constraint that only $M$ $(
arXiv.org Artificial Intelligence
Aug-9-2022
- Country:
- Asia > China
- Jiangsu Province > Nanjing (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > China
- Genre:
- Research Report > Promising Solution (0.34)