Leveraging KV Similarity for Online Structured Pruning in LLMs

Open in new window