PromptDistill: Query-based Selective Token Retention in Intermediate Layers for Efficient Large Language Model Inference

Open in new window