Jang, Minguk
Label Distribution Shift-Aware Prediction Refinement for Test-Time Adaptation
Jang, Minguk, Chung, Hye Won
Test-time adaptation (TTA) is an effective approach to mitigate performance degradation of trained models when encountering input distribution shifts at test time. However, existing TTA methods often suffer significant performance drops when facing additional class distribution shifts. We first analyze TTA methods under label distribution shifts and identify the presence of class-wise confusion patterns commonly observed across different covariate shifts. Based on this observation, we introduce label Distribution shift-Aware prediction Refinement for Test-time adaptation (DART), a novel TTA method that refines the predictions by focusing on class-wise confusion patterns. DART trains a prediction refinement module during an intermediate time by exposing it to several batches with diverse class distributions using the training dataset. This module is then used during test time to detect and correct class distribution shifts, significantly improving pseudo-label accuracy for test data. Our method exhibits 5-18% gains in accuracy under label distribution shifts on CIFAR-10C, without any performance degradation when there is no label distribution shift. Extensive experiments on CIFAR, PACS, OfficeHome, and ImageNet benchmarks demonstrate DART's ability to correct inaccurate predictions caused by test-time distribution shifts. This improvement leads to enhanced performance in existing TTA methods, making DART a valuable plug-in tool.
Test-Time Adaptation via Self-Training with Nearest Neighbor Information
Jang, Minguk, Chung, Sae-Young, Chung, Hye Won
Test-time adaptation (TTA) aims to adapt a trained classifier using online unlabeled test data only, without any information related to the training procedure. Most existing TTA methods adapt the trained classifier using the classifier's prediction on the test data as pseudo-label. However, under test-time domain shift, accuracy of the pseudo labels cannot be guaranteed, and thus the TTA methods often encounter performance degradation at the adapted classifier. To overcome this limitation, we propose a novel test-time adaptation method, called Test-time Adaptation via Self-Training with nearest neighbor information (TAST), which is composed of the following procedures: (1) adds trainable adaptation modules on top of the trained feature extractor; (2) newly defines a pseudo-label distribution for the test data by using the nearest neighbor information; (3) trains these modules only a few times during test time to match the nearest neighbor-based pseudo label distribution and a prototype-based class distribution for the test data; and (4) predicts the label of test data using the average predicted class distribution from these modules. The pseudo-label generation is based on the basic intuition that a test data and its nearest neighbor in the embedding space are likely to share the same label under the domain shift. By utilizing multiple randomly initialized adaptation modules, TAST extracts useful information for the classification of the test data under the domain shift, using the nearest neighbor information. TAST showed better performance than the state-of-the-art TTA methods on two standard benchmark tasks, domain generalization, namely VLCS, PACS, OfficeHome, and TerraIncognita, and image corruption, particularly CIFAR-10/100C.
Few-Example Clustering via Contrastive Learning
Jang, Minguk, Chung, Sae-Young
We propose Few-Example Clustering (FEC), a In this paper, we propose Few-Example Clustering (FEC), a novel algorithm that performs contrastive learning novel clustering algorithm based on the hypothesis that the to cluster few examples. Our method is composed contrastive learner with the ground-truth cluster assignment of the following three steps: (1) generation of candidate is trained faster than the others. This hypothesis is built on cluster assignments, (2) contrastive learning the phenomenon that deep neural networks initially learn for each cluster assignment, and (3) selection patterns from the training examples. FEC is composed of of the best candidate. Based on the hypothesis the following three steps (see Figure 1): (1) generation of that the contrastive learner with the ground-truth candidate cluster assignments, (2) contrastive learning for cluster assignment is trained faster than the others, each cluster assignment, and (3) selection of the best candidate.