Test Time Adaptation via Conjugate Pseudo-labels

Neural Information Processing Systems 

Test-time adaptation (TTA) refers to adapting neural networks to distribution shifts, specifically with just access to unlabeled test samples from the new domain at test-time. Prior TTA methods optimize over unsupervised objectives such as the entropy of model predictions in TENT (Wang et al., 2021), but it is unclear what exactly makes a good TTA loss. In this paper, we start by presenting a surprising phenomenon: if we attempt to \textit{meta-learn} the best'' possible TTA loss over a wide class of functions, then we recover a function that is \textit{remarkably} similar to (a temperature-scaled version of) the softmax-entropy employed by TENT. This only holds, however, if the classifier we are adapting is trained via cross-entropy loss; if the classifier is trained via squared loss, a different best'' TTA loss emerges.To explain this phenomenon, we analyze test-time adaptation through the lens of the training losses's \textit{convex conjugate} . We show that under natural conditions, this (unsupervised) conjugate function can be viewed as a good local approximation to the original supervised loss and indeed, it recovers the best'' losses found by meta-learning.