The Reinforce Policy Gradient Algorithm Revisited