Hamming Attention Distillation: Binarizing Keys and Queries for Efficient Long-Context Transformers

Open in new window