Provably Efficient Q-Learning with Low Switching Cost