Direct Policy Gradients: Direct Optimizationof Policiesin Discrete Action Spaces