An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment