Off-Policy Evaluation with Policy-Dependent Optimization Response

Open in new window