AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

Lu, Haiquan, Zhou, Yefan, Liu, Shiwei, Wang, Zhangyang, Mahoney, Michael W., Yang, Yaoqing

Oct-13-2024–arXiv.org Machine Learning

Recent work on pruning large language models (LLMs) (Frantar and Alistarh, 2023a; Jaiswal et al., 2023; Sun et al., 2023) has shown the ability to reduce the number of parameters significantly, without compromising performance, resulting in notable savings in memory footprint, computing time, and energy consumption. Unlike pre-LLM pruning methods (Kurtic et al., 2022; Sanh et al., 2020), existing LLM pruning approaches typically allocate the "sparsity budget" (i.e., the number of pruned parameters or pruning ratios) uniformly across layers, making it difficult to increase sparsity to very high levels. Relatively little effort has been put into developing theoretically-principled ways to compute layerwise pruning ratios. For example, the Outlier Weighed Layerwise sparsity (OWL) method (Yin et al., 2023) uses a nonuniform layerwise sparsity based on the distribution of outlier activations. However, OWL relies on heuristics related to the presence of outliers (Dettmers et al., 2022; Kovaleva et al., 2021; Puccetti et al., 2022). This can lead to suboptimal performance in the absence of outliers, and this can make it difficult to achieve very aggressive levels of sparsity. For example, Yin et al. (2023) shows that pruning LLMs to 80% sparsity often significantly degrades the prediction performance of LLMs. First two authors contributed equally.

large language model, machine learning, sparsity, (18 more...)

arXiv.org Machine Learning

Oct-13-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England (0.14)
- North America > United States
  - Texas (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Energy (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)
  - Natural Language > Large Language Model (1.00)