Off-Policy Evaluation Under Nonignorable Missing Data

Wang, Han, Xu, Yang, Lu, Wenbin, Song, Rui

Jul-10-2025–arXiv.org Machine Learning

Off-Policy Evaluation (OPE) aims to estimate the value of a target policy using offline data collected from potentially different policies. In real-world applications, however, logged data often suffers from missingness. While OPE has been extensively studied in the literature, a theoretical understanding of how missing data affects OPE results remains unclear. In this paper, we investigate OPE in the presence of monotone missingness and theoretically demonstrate that the value estimates remain unbiased under ignorable missingness but can be biased under nonignorable (informative) missingness. To retain the consistency of value estimation, we propose an inverse probability weighted value estimator and conduct statistical inference to quantify the uncertainty of the estimates. Through a series of numerical experiments, we empirically demonstrate that our proposed estimator yields a more reliable value inference under missing data.

machine learning, missingness, reinforcement learning, (18 more...)

arXiv.org Machine Learning

Jul-10-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.04)
  - United States
    - Wisconsin (0.04)
    - North Carolina (0.04)
    - Connecticut (0.04)
- Europe > United Kingdom
  - England > Bristol (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.93)

Industry:
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area (1.00)

Technology:
- Information Technology
  - Data Science
    - Data Mining (1.00)
    - Data Quality (0.92)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning
      - Reinforcement Learning (1.00)
      - Statistical Learning > Regression (0.45)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found