Direct Gradient Temporal Difference Learning