Reviews: Provably Efficient Q-Learning with Low Switching Cost

Neural Information Processing Systems 

They also present (two flavours of) a Q-learning algorithm that achieve the regret matching the previous work however with the added benefit of having lower local switching cost.