Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes Wen Sun

Neural Information Processing Systems 

We study the evaluation of a policy under best-and worst-case perturbations to a Markov decision process (MDP), using transition observations from the original MDP, whether they are generated under the same or a different policy. This is an important problem when there is the possibility of a shift between historical and future environments, e.g.