Reinforcement Learning by Value Gradients