Logarithmic Regret for Online KL-Regularized Reinforcement Learning