Minimax Weight and Q-Function Learning for Off-Policy Evaluation

Open in new window