BREEDS: Benchmarks for Subpopulation Shift
Santurkar, Shibani, Tsipras, Dimitris, Madry, Aleksander
Robustness to distribution shift has been the focus of a long line of work in machine learning [SG86; WK93; KHA99; Shi00; SKM07; Qui 09; Mor 12; SK12]. At a high-level, the goal is to ensure that models perform well not only on unseen samples from the datasets they are trained on, but also on the diverse set of inputs they are likely to encounter in the real world. However, building benchmarks for evaluating such robustness is challenging--it requires modeling realistic data variations in a way that is well-defined, controllable, and easy to simulate. Prior work in this context has focused on building benchmarks that capture distribution shifts caused by natural or adversarial input corruptions [Sze 14; FF15; FMF16; Eng 19a; For 19; HD19; Kan 19], differences in data sources [Sae 10; TE11; Kho 12; TT14; Rec 19], and changes in the frequencies of data subpopulations [Ore 19; Sag 20]. While each of these approaches captures a different source of real-world distribution shift, we cannot expect any single benchmark to be comprehensive. Thus, to obtain a holistic understanding of model robustness, we need to keep expanding our testbed to encompass more natural modes of variation.
Aug-11-2020
- Country:
- North America > United States (1.00)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.93)
- Natural Language (1.00)
- Representation & Reasoning (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence