Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

Feb-3-2024–arXiv.org Artificial Intelligence

We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it's best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cram\'er's theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding's bound.

classifier, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Feb-3-2024

arXiv.org PDF

Add feedback

Country:
- Europe
  - Germany > Baden-Württemberg
    - Tübingen Region > Tübingen (0.14)
  - Spain (0.14)

Genre:
- Research Report > New Finding (0.86)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.46)
  - Natural Language (1.00)