VCC: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens

Open in new window