On the model-based stochastic value gradient for continuous reinforcement learning

Open in new window