Restless Hidden Markov Bandits with Linear Rewards

Yemini, Michal, Leshem, Amir, Somekh-Baruch, Anelia

Oct-22-2019–arXiv.org Machine Learning

This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state. In contrast to previous works on Markovian bandits, we do not assume that the decision maker receives information regarding the state of the system, but has to infer it based on its actions and the received reward. Surprisingly, we can still maintain logarithmic regret in the case of polyhedral action set. Furthermore, the regret does not depend on the number of extreme points in the action space.

algorithm 1, confidence interval, decision maker, (14 more...)

arXiv.org Machine Learning

Oct-22-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Rhode Island > Providence County
    - Providence (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Asia > Middle East
  - Israel (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found