The Logic of Benchmarking: A Case Against State-of-the-Art Performance

Ruml, Wheeler (University of New Hampshire)

Aug-25-2010–AAAI Conferences

The second is causal analysis: isolate I have seen several reviewers claim that experimental evaluations what causes a performance improvement and leave everything must use problem instances that are large enough to else unchanged, hopefully leaving the uncontrolled biases take more than x time to solve, where x has varied from the same between the two conditions. Note that nothing a few seconds to many minutes. The fact that the stated in this disturbing anecdote relates to benchmark size. Both values for x have varied over two orders of magnitude is short-and long-running benchmarks were affected, and the perhaps one indication that the criterion has no firm basis, recommendation is to increase benchmark diversity, which but let's consider it in more depth. I believe that it represents requires more benchmarks, which must therefore be smaller a serious problem: I have seen a paper (not from (in order to fit into the same PhD program or yearly conference my group) proposing a novel and interesting algorithm that timescale). Large benchmarks therefore inhibit correction soundly beat the state-of-the-art by orders of magnitude rejected of measurement bias.

benchmark, benchmarking, variation, (16 more...)

AAAI Conferences

Aug-25-2010

Conferences PDF

Add feedback

Country:
- North America > United States
  - New Hampshire (0.05)
- Asia > Middle East
  - Jordan (0.05)

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found