Safe Policy Improvement by Minimizing Robust Baseline Regret