Safe Policy Improvement by Minimizing Robust Baseline Regret

Mohammad Ghavamzadeh, Marek Petrik, Yinlam Chow

Neural Information Processing Systems 

In this paper, we develop and analyze a new model-based approach that computes a safe policy, given an inaccurate model of the system's

Similar Docs  Excel Report  more

TitleSimilaritySource
None found