effective robustness
Granularity__final
We use the iWildCam version 2.0 released in 2021 as a Examples of train set images can be seen in Figure 14. Random examples from the out-of-distribution test set. Figure 15 shows examples of train set images. Figure 15: Random examples from the ImageNet ILSVRC 2012 challenge train set [37, 11]. The full training set is notably not class balanced, exhibiting a long-tailed distribution (see Figure 16). Figure 17: Random examples from the iNaturalist 2017 challenge train set [46].
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Effective Robustness against Natural Distribution Shifts for Models with Different Training Data
Existing effective robustness evaluations typically use a single test set such as ImageNet to evaluate the ID accuracy. This becomes problematic when evaluating models trained on different data distributions, e.g., comparing models trained on ImageNet vs. zero-shot language-image pre-trained models trained on LAION. In this paper, we propose a new evaluation metric to evaluate and compare the effective robustness of models trained on different data. To do this, we control for the accuracy on multiple ID test sets that cover the training distributions for all the evaluated models. Our new evaluation metric provides a better estimate of effective robustness when there are models with different training data. It may also explain the surprising effective robustness gains of zero-shot CLIP-like models exhibited in prior works that used ImageNet as the only ID test set, while the gains diminish under our new evaluation.
Models Out of Line: A Fourier Lens on Distribution Shift Robustness
Improving the accuracy of deep neural networks on out-of-distribution (OOD) data is critical to an acceptance of deep learning in real world applications. It has been observed that accuracies on in-distribution (ID) versus OOD data follow a linear trend and models that outperform this baseline are exceptionally rare (and referred to as ``effectively robust"). Recently, some promising approaches have been developed to improve OOD robustness: model pruning, data augmentation, and ensembling or zero-shot evaluating large pretrained models. However, there still is no clear understanding of the conditions on OOD data and model properties that are required to observe effective robustness. We approach this issue by conducting a comprehensive empirical study of diverse approaches that are known to impact OOD robustness on a broad range of natural and synthetic distribution shifts of CIFAR-10 and ImageNet. In particular, we view the effective robustness puzzle through a Fourier lens and ask how spectral properties of both models and OOD data correlate with OOD robustness. We find this Fourier lens offers some insight into why certain robust models, particularly those from the CLIP family, achieve OOD robustness. However, our analysis also makes clear that no known metric is consistently the best explanation of OOD robustness. Thus, to aid future research into the OOD puzzle, we address the gap in publicly-available models with effective robustness by introducing a set of pretrained CIFAR-10 models---$RobustNets$---with varying levels of OOD robustness.
AT overview
Each row is a model, and each column is an evaluation setting. A few cells are empty due to resource constraints. As discussed in Section 4.1, multiple models trained on more data achieve positive effective robustness However, this effect is not uniform. Our experiments suggest that neither growing the number of images nor classes in an i.i.d. For one, our experiments consider only i.i.d.
- South America (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa (0.04)
- Information Technology (0.93)
- Health & Medicine > Therapeutic Area (0.68)
- Health & Medicine > Diagnostic Medicine (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Communications > Social Media (0.68)
We thank the reviewers for their feedback and reply to the major points raised by each reviewer individually
We thank the reviewers for their feedback and reply to the major points raised by each reviewer individually. Our paper focuses on ImageNet classification because this is what almost all prior work on robustness has studied. We hope that future work (e.g., transfer learning research) can build on our testbed. Our results are substantially more nuanced than "more data helps": (i) We show that only more data currently helps This is a strong negative result. Appendix D contains additional results for more granular trends.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)