Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads

Open in new window