Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Voloshin, Cameron, Le, Hoang M., Jiang, Nan, Yue, Yisong

Nov-15-2019–arXiv.org Artificial Intelligence

Off-policy policy evaluation (OPE) is the problem of estimating the online performance of a policy using only pre-collected historical data generated by another policy. Given the increasing interest in deploying learning-based methods for safety-critical applications, many recent OPE methods have recently been proposed. Due to disparate experimental conditions from recent literature, the relative performance of current OPE methods is not well understood. In this work, we present the first comprehensive empirical analysis of a broad suite of OPE methods. Based on thousands of experiments and detailed empirical analyses, we offer a summarized set of guidelines for effectively using OPE in practice, and suggest directions for future research.

ip standard per-decision, relative mse, stochastic environment, (10 more...)

arXiv.org Artificial Intelligence

Nov-15-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Quebec > Montreal (0.04)

Genre:
- Research Report (0.63)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found