An Approach to Multiple Comparison Benchmark Evaluations that is Stable Under Manipulation of the Comparate Set