Prefixing Attention Sinks can Mitigate Activation Outliers for Large Language Model Quantization

Open in new window