V ariance-Aware Off-Policy Evaluation with Linear Function Approximation

Neural Information Processing Systems 

We show that our algorithm achieves a tighter error bound than the best-known result.