Identifying Statistical Bias in Dataset Replication

Engstrom, Logan, Ilyas, Andrew, Santurkar, Shibani, Tsipras, Dimitris, Steinhardt, Jacob, Madry, Aleksander

Sep-2-2020–arXiv.org Machine Learning

The primary objective of supervised learning is to develop models that generalize robustly to unseen data. Benchmark test sets provide a proxy for out-of-sample performance, but can outlive their usefulness in some cases. For example, evaluating on benchmarks alone may steer us towards models that adaptively overfit [Reu03; RFR08; Dwo 15] to the finite test set and do not generalize. Alternatively, we might select for models that are sensitive to insignificant aspects of the dataset creation process and thus do not generalize robustly (e.g., models that are sensitive to the exact set of humans who annotated the test set). To diagnose these issues, recent work has generated new, previously "unseen" testbeds for standard datasets through a process known as dataset replication. Though not yet widespread in machine learning, dataset replication is a natural analogue to experimental replication studies in the natural sciences (cf.

artificial intelligence, machine learning, selection frequency, (19 more...)

arXiv.org Machine Learning

Sep-2-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Government (0.46)
- Transportation (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Statistical Learning (0.46)
    - Representation & Reasoning (1.00)
  - Communications (0.94)
  - Data Science (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found