A Non-asymptotic Analysis of Non-parametric Temporal-Difference Learning