Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models

Open in new window