Low-Complexity Algorithm for Restless Bandits with Imperfect Observations

Liu, Keqin, Weber, Richard, Wu, Ting, Zhang, Chengzhong

arXiv.org Artificial Intelligence 

We consider a class of restless bandit problems that finds a broad application area in stochastic optimization, reinforcement learning and operations research. We consider $N$ independent discrete-time Markov processes, each of which had two possible states: 1 and 0 (`good' and `bad'). Only if a process is both in state 1 and observed to be so does reward accrue. The aim is to maximize the expected discounted sum of returns over the infinite horizon subject to a constraint that only $M$ $(

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found