Discovering a set of policies for the worst case reward