Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation