Review for NeurIPS paper: Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces