Policy gradients in linearly-solvable MDPs