Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes Wen Sun
–Neural Information Processing Systems
We study the evaluation of a policy under best-and worst-case perturbations to a Markov decision process (MDP), using transition observations from the original MDP, whether they are generated under the same or a different policy. This is an important problem when there is the possibility of a shift between historical and future environments, e.g.
Neural Information Processing Systems
Jun-1-2025, 20:41:37 GMT
- Country:
- North America > United States (0.67)
- Genre:
- Research Report > Experimental Study (0.67)
- Industry: