Supplementary materials for Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing Anonymous Author(s) Affiliation Address email AAdditional graphs from outlier analysis1

Apr-30-2026, 05:24:42 GMT–Neural Information Processing Systems

Figure 1: A summary of several outlier statistics recorded from ImageNet validation set on ViT. We use zero-based indexing for dimensions. BERTRecall from Figure 1 that all the outliers are only present in hidden dimensions #123, #180,4 #225, #308, #381, #526, #720 (with the majority of them in #180, #720). In Figures 9 and 10 we show more6 examples of the discovered self-attention patterns for attention heads #3 and #12 ( hidden dim #1807 and #720, respectively). We also show self-attention patterns in attention heads and layers which are8 not associated with the outliers in Figures 11 and 12, respectively.9

artificial intelligence, attention layer, machine learning, (16 more...)

Neural Information Processing Systems

Apr-30-2026, 05:24:42 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
Supplementary materials for Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing Anonymous Author(s) Affiliation Address email A Additional graphs from outlier analysis

Similar Docs Excel Report more

Title	Similarity	Source
None found