Learning to Focus: Causal Attention Distillation via Gradient‐Guided Token Pruning

Open in new window