Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

Open in new window