On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation