Whittle index based Q-learning for restless bandits with average reward

Open in new window