Restless Hidden Markov Bandits with Linear Rewards

Yemini, Michal, Leshem, Amir, Somekh-Baruch, Anelia

arXiv.org Machine Learning 

This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state. In contrast to previous works on Markovian bandits, we do not assume that the decision maker receives information regarding the state of the system, but has to infer it based on its actions and the received reward. Surprisingly, we can still maintain logarithmic regret in the case of polyhedral action set. Furthermore, the regret does not depend on the number of extreme points in the action space.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found