Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference

Open in new window