Uncertainty quantification for Markov chains with application to temporal difference learning