A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes

Open in new window