Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains