Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models Lujun Li
–Neural Information Processing Systems
In this paper, we present DSA, the first automated framework for discovering sparsity allocation schemes for layer-wise pruning in Large Language Models (LLMs). LLMs have become increasingly powerful, but their large parameter counts make them computationally expensive. Existing pruning methods for compressing LLMs primarily focus on evaluating redundancies and removing element-wise weights. However, these methods fail to allocate adaptive layer-wise sparsities, leading to performance degradation in challenging tasks.
Neural Information Processing Systems
Oct-10-2025, 22:49:35 GMT
- Country:
- Asia > China
- Guangdong Province
- Heilongjiang Province > Harbin (0.04)
- Hong Kong (0.04)
- Asia > China
- Genre:
- Research Report > Experimental Study (0.93)
- Technology: