Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
Tengyang Xie, Yifei Ma, Yu-Xiang Wang
–Neural Information Processing Systems
Solving OPE is often the starting point in many RL applications. To tackle the problem of OPE, the idea of importance sampling (IS) corrects the mismatch in the distributions under the behavior policy and target policy.
Neural Information Processing Systems
Feb-12-2026, 03:43:58 GMT
- Country:
- Europe > Sweden
- North America
- Canada (0.04)
- United States
- California
- San Mateo County > East Palo Alto (0.04)
- Santa Barbara County > Santa Barbara (0.04)
- Santa Clara County > Palo Alto (0.04)
- Illinois > Champaign County
- Urbana (0.04)
- Massachusetts > Hampshire County
- Amherst (0.04)
- California
- Industry:
- Health & Medicine (0.93)