Whittle index based Q-learning for restless bandits with average reward