BREEDS: Benchmarks for Subpopulation Shift

Santurkar, Shibani, Tsipras, Dimitris, Madry, Aleksander

Aug-11-2020–arXiv.org Machine Learning

Robustness to distribution shift has been the focus of a long line of work in machine learning [SG86; WK93; KHA99; Shi00; SKM07; Qui 09; Mor 12; SK12]. At a high-level, the goal is to ensure that models perform well not only on unseen samples from the datasets they are trained on, but also on the diverse set of inputs they are likely to encounter in the real world. However, building benchmarks for evaluating such robustness is challenging--it requires modeling realistic data variations in a way that is well-defined, controllable, and easy to simulate. Prior work in this context has focused on building benchmarks that capture distribution shifts caused by natural or adversarial input corruptions [Sze 14; FF15; FMF16; Eng 19a; For 19; HD19; Kan 19], differences in data sources [Sae 10; TE11; Kho 12; TT14; Rec 19], and changes in the frequencies of data subpopulations [Ore 19; Sag 20]. While each of these approaches captures a different source of real-world distribution shift, we cannot expect any single benchmark to be comprehensive. Thus, to obtain a holistic understanding of model robustness, we need to keep expanding our testbed to encompass more natural modes of variation.

ground transportation, neural network, superclass, (23 more...)

arXiv.org Machine Learning

Aug-11-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Government
  - Military (1.00)
  - Regional Government > North America Government
    - United States Government (0.67)
- Leisure & Entertainment > Sports (0.93)
- Transportation
  - Ground > Road (1.00)
  - Passenger (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.93)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)
  - Vision (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found