Mitigating Goal Misgeneralization via Minimax Regret

Open in new window