Appendix of " Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering "

Neural Information Processing Systems 

An image is worth 16x16 words: Transformers for image recognition at scale.