Reviews: Safe Policy Improvement by Minimizing Robust Baseline Regret

Jan-20-2025, 16:07:51 GMT–Neural Information Processing Systems

The paper considers the problem of robust MDP. In particular, it considers that we have an uncertain model of transition probabilities of the MDP (with the rectangular uncertainty) as well as a baseline policy. The goal is to find a new policy such that it is guaranteed to be no worse than the baseline. This is called the safe policy improvement. A robust approach would find a policy that maximizes the worst case performance.

approximation, baseline policy, minimizing robust baseline regret, (5 more...)

Neural Information Processing Systems

Jan-20-2025, 16:07:51 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.54)