SlimGPT: Layer-wise Structured Pruning for Large Language Models Gui Ling, Ziyang Wang, Yuliang Y an

Neural Information Processing Systems 

Structured pruning is an effective method to balance model performance with efficiency, but performance restoration under computational resource constraints is a principal challenge in pruning LLMs. Therefore, we present a low-cost and fast structured pruning method for LLMs named SlimGPT based on the Optimal Brain Surgeon framework.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found