A Experimental Protocol

Neural Information Processing Systems 

We selected hyperparameters using the four disjoint validation corruptions provided with CIFAR-10-C and ImageNet-C [12]. As the other benchmarks are only test sets and do not provide validation sets, we used the same hyperparameters found using the corruption validation sets and do not perform any additional tuning. We considered the following hyperparameters when performing a grid search. Beyond learning rate and number of gradient steps, we also evaluated using a simple "threshold" by performing adaptation only when the marginal entropy was greater than 50% of the maximum value (log 1000 for ImageNet-C), though we found that this resulted in slightly worse validation performance. We also considered different values of the prior strength N for single point BN adaptation, and we found that 16 performed best on the validation sets as suggested in Schneider et al. [40].