Appendix

Aug-14-2025, 21:19:20 GMT–Neural Information Processing Systems

Additional details and results from the different sections are included below. The solution can be recovered efficiently in closed form. For vision transformers, we train linear probes on representations from individual tokens or on the representation averaged over all tokens, at the output of different transformer layers (each layer meaning a full transformer block including self-attention and MLP). Top shows CKA heatmap for ViT -B/32, where we can also observe strong similarity between lower and higher layers and the grid like, uniform representation structure. In Figures C.1, C.2, and C.3, we provide full plots of effective receptive fields of all layers of ViT -B/32, ResNet-50, and ViT -L/16, taken after the residual connections as in Figure 6 in the text.

linear probe, receptive field, representation, (14 more...)

Neural Information Processing Systems

Aug-14-2025, 21:19:20 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
Appendix

Similar Docs Excel Report more

Title	Similarity	Source
None found