Whittle index based Q-learning for restless bandits with average reward

Avrachenkov, Konstantin, Borkar, Vivek S.

Mar-9-2021–arXiv.org Machine Learning

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.

algorithm, restless bandit, whittle index, (14 more...)

arXiv.org Machine Learning

Mar-9-2021

arXiv.org PDF

Add feedback

Country:
- Oceania > New Zealand (0.04)
- Europe
  - France (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Germany > Saarland
    - Saarbrücken (0.04)
- Asia > India
  - Maharashtra > Mumbai (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found