The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning