Fast Convergence of Policy Regret in Learning Stochastic Optimal Control

Open in new window