Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces