Theoretical and Experimental Comparison of Off-Policy Evaluation from Dependent Samples