Optimal Off-Policy Evaluation from Multiple Logging Policies

Kallus, Nathan, Saito, Yuta, Uehara, Masatoshi

Oct-21-2020–arXiv.org Machine Learning

We study off-policy evaluation (OPE) from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling. Previous work noted that in this setting the ordering of the variances of different importance sampling estimators is instance-dependent, which brings up a dilemma as to which importance sampling weights to use. In this paper, we resolve this dilemma by finding the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one. In particular, we establish the efficiency bound under stratified sampling and propose an estimator achieving this bound when given consistent $q$-estimates. To guard against misspecification of $q$-functions, we also provide a way to choose the control variate in a hypothesis class to minimize variance. Extensive experiments demonstrate the benefits of our methods' efficiently leveraging of the stratified sampling of off-policy data from multiple loggers.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

Oct-21-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)
- Asia > Japan
  - Honshū > Kantō
    - Tokyo Metropolis Prefecture > Tokyo (0.04)
    - Kanagawa Prefecture (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology
  - Data Science (0.67)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning > Statistical Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found