Reinforcement Learning with Logarithmic Regret and Policy Switches