Whittle index based Q-learning for restless bandits with average reward
Avrachenkov, Konstantin, Borkar, Vivek S.
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.
Mar-9-2021
- Country:
- Oceania > New Zealand (0.04)
- Europe
- France (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Germany > Saarland
- Saarbrücken (0.04)
- Asia > India
- Maharashtra > Mumbai (0.04)
- Genre:
- Research Report (0.50)
- Technology: