SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning

Open in new window