Self-Selected Attention Span for Accelerating Large Language Model Inference

Open in new window