Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy

Apr-2-2024–arXiv.org Machine Learning

In various decision-making problems, estimating the value, the expected reward of a policy is a crucial question that needs to be addressed. Online evaluation requiring a comprehensive evaluation of policy value can be expensive and may not be applicable to multiple target policies. Alternatively, off-policy evaluation (OPE) refers to a technique that estimates the value of a target policy by utilizing log data generated from a different logging policy. This approach has attracted considerable interest in the domains of contextual bandits (CB) [Dudík et al., 2011, Swaminathan et al., 2017] and reinforcement learning (RL) [Precup, 2000, Mahmood et al., 2014, Jiang and Li, 2016]. Several off-policy evaluation algorithms [Dudík et al., 2011, Thomas and Brunskill, 2016, Wang et al., 2017, Farajtabar et al., 2018, Su et al., 2020] currently in use rely on having complete knowledge of the logging policy in order to utilize inverse probability weighting (IPW).

asymptotic variance, doubly-robust off-policy evaluation, estimator, (11 more...)

arXiv.org Machine Learning

Apr-2-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > South Korea
  - Seoul > Seoul (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (0.46)
  - Reinforcement Learning (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found