Preferential Temporal Difference Learning