Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning

Open in new window