Appendix

Neural Information Processing Systems 

We do this for all combinations of blocks and tokens. 1 2 Class representations in image tokens across the hierarchy Asterisks indicate a significant difference between both types of tokens. We additionally conducted an analysis comparing the class similarity change rate of class-and context-labeled tokens in self-attention layers. Figure 17: Agreement rate difference between correctly classified vs. misclassified samples. Figure 18: Percentage of instances where the layer's final predictions match any of the top-5 predictions of the most activated memories. AUC is better, while in the positive perturbation experiments (POS) a lower AUC is better.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found