imagenet-1k
On the Powerfulness of Textual Outlier Exposure for Visual OoDDetection (Appendix) AAdditional experimental results
This section presents more comprehensive experimental results. A.1 Comparison with post-hoc methods We also compare the performance of our textual outlier method with post-hoc approaches, which are another prominent approach in OoD detection. We conducted comparisons with six widely used and recently proposed methods known for their detection performance (MSP [4], ODIN [8], Mahalanobis [7], Energy [10], ReAct [14], KNN [15]). All advanced baseline methods follow the original paper's settings. Among these methods, our textual outlier approach demonstrate the best performance, further emphasizing its effectiveness as demonstrated in Table 6.
Revisit the Power of Vanilla Knowledge Distillation: from Small Scale to Large Scale Supplementary Material
A.1 Details of "stronger recipe" In Table 1 of our main paper, we evaluate the impact of limited model capacity [1] and small-scale dataset by comparing the results of using "previous training recipe" and our "stronger recipe". We summarize the details of "stronger recipe" and present them in Table 13. Table 13: Stronger training strategy used for distillation. "B" and "C" represent strategies for training students on ImageNet-1K and CIFAR100, respectively. A.2 Numerical results In Figure 1 of our main paper, we present a comparison of performance gaps among vanilla KD and two logits-based baselines, i.e., DKD [2] and DIST [3], on two datasets of varying scales, to demonstrate the underestimation of vanilla KD on small-scale datasets.
Supplementary Material AEvaluation on CIFARBenchmarks
Setup We additionally evaluate GradNorm on a common benchmark with CIFAR-10 and CIFAR100 [22] as ID datasets, which is routinely used in literature [13, 27, 14, 29, 26]. We use the standard split with 50,000 training images and 10,000 test images. The learning rate is initially 0.1, and decays by a factor of 10 at epochs 50, 75 and 90 respectively. Results We summarize the results in Table 6, where GradNormremains competitive. In particular, GradNorm reduces the average FPR95 by 8.77% on CIFAR-10 compared to the best baseline.
SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification
Data curation is the problem of how to collect and organize samples into a dataset that supports efficient learning. Despite the centrality of the task, little work has been devoted towards a large-scale, systematic comparison of various curation methods. In this work, we take steps towards a formal evaluation of data curation strategies and introduce SELECT, the first large-scale benchmark of curation strategies for image classification.In order to generate baseline methods for the SELECT benchmark, we create a new dataset, ImageNet++, which constitutes the largest superset of ImageNet-1K to date. Our dataset extends ImageNet with 5 new training-data shifts, each approximately the size of ImageNet-1K, and each assembled using a distinct curation strategy. We evaluate our data curation baselines in two ways: (i) using each training-data shift to train identical image classification models from scratch (ii) using it to inspect a fixed pretrained self-supervised representation.Our findings show interesting trends, particularly pertaining to recent methods for data curation such as synthetic data generation and lookup based on CLIP embeddings. We show that although these strategies are highly competitive for certain tasks, the curation strategy used to assemble the original ImageNet-1K dataset remains the gold standard. We anticipate that our benchmark can illuminate the path for new methods to further reduce the gap.