On learning Whittle index policy for restless bandits with scalable regret

Open in new window