Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
Tengyang Xie, Yifei Ma, Yu-Xiang Wang
–Neural Information Processing Systems
Solving OPE is often the starting point in many RL applications. To tackle the problem of OPE, the idea of importance sampling (IS) corrects the mismatch in the distributions under the behavior policy and target policy.
Neural Information Processing Systems
Feb-12-2026, 03:43:58 GMT
- Country:
- North America
- Canada (0.04)
- United States
- Massachusetts > Hampshire County
- Amherst (0.04)
- Illinois > Champaign County
- Urbana (0.04)
- California
- Santa Clara County > Palo Alto (0.04)
- Santa Barbara County > Santa Barbara (0.04)
- San Mateo County > East Palo Alto (0.04)
- Massachusetts > Hampshire County
- Europe > Sweden
- North America
- Industry:
- Health & Medicine (0.93)