The Logic of Benchmarking: A Case Against State-of-the-Art Performance
Ruml, Wheeler (University of New Hampshire)
The second is causal analysis: isolate I have seen several reviewers claim that experimental evaluations what causes a performance improvement and leave everything must use problem instances that are large enough to else unchanged, hopefully leaving the uncontrolled biases take more than x time to solve, where x has varied from the same between the two conditions. Note that nothing a few seconds to many minutes. The fact that the stated in this disturbing anecdote relates to benchmark size. Both values for x have varied over two orders of magnitude is short-and long-running benchmarks were affected, and the perhaps one indication that the criterion has no firm basis, recommendation is to increase benchmark diversity, which but let's consider it in more depth. I believe that it represents requires more benchmarks, which must therefore be smaller a serious problem: I have seen a paper (not from (in order to fit into the same PhD program or yearly conference my group) proposing a novel and interesting algorithm that timescale). Large benchmarks therefore inhibit correction soundly beat the state-of-the-art by orders of magnitude rejected of measurement bias.
Aug-25-2010