SlimGPT: Layer-wise Structured Pruning for Large Language Models Gui Ling, Ziyang Wang, Yuliang Y an
–Neural Information Processing Systems
Structured pruning is an effective method to balance model performance with efficiency, but performance restoration under computational resource constraints is a principal challenge in pruning LLMs. Therefore, we present a low-cost and fast structured pruning method for LLMs named SlimGPT based on the Optimal Brain Surgeon framework.
Neural Information Processing Systems
Oct-10-2025, 15:39:23 GMT