On learning Whittle index policy for restless bandits with scalable regret