Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models Lujun Li

Open in new window