Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator