Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

Open in new window