Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Open in new window