VCC: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens

Neural Information Processing Systems 

Transformers are central in modern natural language processing and computer vision applications.