Entropy is not Enough for Test-Time Adaptation: From the Perspective of Disentangled Factors

Lee, Jonghyun, Jung, Dahuin, Lee, Saehyung, Park, Junsung, Shin, Juhyeon, Hwang, Uiwon, Yoon, Sungroh

arXiv.org Artificial Intelligence 

The primary challenge of TTA is limited access to the entire test dataset during online updates, causing error accumulation. To mitigate it, TTA methods have utilized the model output's entropy as a confidence metric that aims to determine which samples have a lower likelihood of causing error. Through experimental studies, however, we observed the unreliability of entropy as a confidence metric for TTA under biased scenarios and theoretically revealed that it stems from the neglect of the influence of latent disentangled factors of data on predictions. Building upon these findings, we introduce a novel TTA method named Destroy Your Object (DeYO), which leverages a newly proposed confidence metric named Pseudo-Label Probability Difference (PLPD). PLPD quantifies the influence of the shape of an object on prediction by measuring the difference between predictions before and after applying an object-destructive transformation. DeYO consists of sample selection and sample weighting, which employ entropy and PLPD concurrently. For robust adaptation, DeYO prioritizes samples that dominantly incorporate shape information when making predictions. Our extensive experiments demonstrate the consistent superiority of DeYO over baseline methods across various scenarios, including biased and wild. Although deep neural networks (DNNs) demonstrate powerful performance across various domains, they lack robustness against distribution shifts under conventional training (He et al., 2016; Pan & Yang, 2009). Therefore, research areas such as domain generalization (Blanchard et al., 2011; Gulrajani & Lopez-Paz, 2021), which involves training models to be robust against arbitrary distribution shifts, and unsupervised domain adaptation (UDA) (Ganin & Lempitsky, 2015; Park et al., 2020), which seeks domain-invariant information for label-absent target domains, have been extensively investigated in the existing literature. Test-time adaptation (TTA) (Wang et al., 2021a) has also gained significant attention as a means to address distribution shifts occurring during test time. TTA leverages each data point once for adaptation immediately after inference. Its minimal overhead compared to existing areas makes it particularly suitable for real-world applications (Azimi et al., 2022). Because UDA assumes access to the entire test samples before adaptation, it utilizes its information on a task by analyzing the distribution of the entire test set (Kang et al., 2019). It leads to inaccurate predictions, and incorporating them into model updates results in error accumulation within the model (Arazo et al., 2020).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found