Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes Andrew Bennett

Neural Information Processing Systems 

MDP, whether they are generated under the same or a different policy. This is an important problem when there is the possibility of a shift between historical and future environments, e.g.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found