Direct Policy Gradients: Direct Optimizationof Policiesin Discrete Action Spaces

Open in new window