Fast Convergence of Policy Regret in Learning Stochastic Optimal Control