Policy Optimization with Stochastic Mirror Descent

Open in new window