Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

Tengyang Xie, Yifei Ma, Yu-Xiang Wang

Neural Information Processing Systems 

Solving OPE is often the starting point in many RL applications. To tackle the problem of OPE, the idea of importance sampling (IS) corrects the mismatch in the distributions under the behavior policy and target policy.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found