AITopics | alphapruning

Collaborating Authors

alphapruning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

Neural Information Processing SystemsMar-18-2026, 05:21:48 GMT

Recent work on pruning large language models (LLMs) has shown that one can eliminate a large number of parameters without compromising performance, making pruning a promising strategy to reduce LLM model size. Existing LLM pruning strategies typically assign uniform pruning ratios across layers, limiting overall pruning ability; and recent work on layerwise pruning of LLMs is often based on heuristics that can easily lead to suboptimal performance. In this paper, we leverage Heavy-Tailed Self-Regularization (HT-SR) Theory, in particular the shape of empirical spectral densities (ESDs) of weight matrices, to design improved layerwise pruning ratios for LLMs. Our analysis reveals a wide variability in how well-trained, and thus relatedly how prunable, different layers of an LLM are. Based on this, we propose AlphaPruning, which uses shape metrics to allocate layerwise sparsity ratios in a more theoretically-principled manner. AlphaPruning can be used in conjunction with multiple existing LLM pruning methods. Our empirical results show that AlphaPruning prunes LLaMA-7B to 80% sparsity while maintaining reasonable perplexity, marking a first in the literature on LLMs.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

10fc83943b4540a9524af6fc67a23fef-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 02:16:25 GMT

alphapruning, pl alpha hill, sparsity, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven Pruning

Beck, Florentin, Rudman, William, Eickhoff, Carsten

arXiv.org Artificial IntelligenceOct-14-2025

Large Language Models (LLMs) present significant computational and memory challenges due to their extensive size, making pruning essential for their efficient deployment. Existing one-shot pruning methods often apply uniform sparsity constraints across layers or within each layer, resulting in suboptimal performance, especially at high sparsity ratios. This work introduces TRIM (Targeted Row-wise Iterative Metric-driven pruning), a novel approach that applies varying sparsity ratios to individual output dimensions (rows) within each layer. TRIM employs an iterative adjustment process guided by quality metrics to optimize dimension-wise sparsity allocation, focusing on reducing variance in quality retention across outputs to preserve critical information. TRIM can be seamlessly integrated with existing layer-wise pruning strategies. Our evaluations on perplexity and zero-shot tasks across diverse LLM families (Qwen2.5, LLaMA-2, and OPT) and sparsity levels demonstrate that TRIM achieves new state-of-the-art results and enhances stability. For instance, at 80% sparsity, TRIM reduces perplexity by 48% for Qwen2.5-14B and over 90% for OPT-13B compared to baseline methods. We conclude that fine-grained, dimension-wise sparsity adaptation is crucial for pushing the limits of extreme LLM compression. Code available at: https://github.com/flobk/TRIM

dimension, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.16743

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

Neural Information Processing SystemsOct-9-2025, 18:49:42 GMT

We have open-sourced our code.

alphapruning, pl alpha hill, sparsity, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias

Hu, Yuanzhe, Goel, Kinshuk, Killiakov, Vlad, Yang, Yaoqing

arXiv.org Artificial IntelligenceAug-19-2025

Diagnosing deep neural networks (DNNs) by analyzing the eigenspectrum of their weights has been an active area of research in recent years. One of the main approaches involves measuring the heavytailness of the empirical spectral densities (ESDs) of weight matrices. This analysis has been shown to provide insights to help diagnose whether a model is well-trained or undertrained, and has been used to guide training methods involving layer-wise hyperparameter assignment. In this paper, we address an often-overlooked challenge in estimating the heavytailness of these ESDs: the impact of the aspect ratio of weight matrices. We demonstrate that matrices of varying sizes (and aspect ratios) introduce a non-negligible bias in estimating the heavytailness of ESDs, leading to inaccurate model diagnosis and layer-wise hyperparameter assignment. To overcome this challenge, we propose FARMS (Fixed-Aspect-Ratio Matrix Subsampling), a method that normalizes the weight matrices by subsampling submatrices with a fixed aspect ratio. Instead of measuring the heavytailness of the original ESD, we measure the average ESD of these subsampled submatrices. We show that this method effectively mitigates the aspect ratio bias. We validate our approach across various optimization techniques and application domains that involve eigenspectrum analysis of weights, including image classification in computer vision (CV) models, scientific machine learning (SciML) model training, and large language model (LLM) pruning. Our results show that despite its simplicity, FARMS uniformly improves the accuracy of eigenspectrum analysis while enabling more effective layer-wise hyperparameter assignment. In one of the LLM pruning experiments, FARMS reduces the perplexity of the LLaMA-7B model by 17.3% when compared with state-of-the-art methods.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.0628

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

Neural Information Processing SystemsMay-26-2025, 16:32:09 GMT

artificial intelligence, large language model, natural language, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

Lu, Haiquan, Zhou, Yefan, Liu, Shiwei, Wang, Zhangyang, Mahoney, Michael W., Yang, Yaoqing

arXiv.org Machine LearningOct-13-2024

Recent work on pruning large language models (LLMs) (Frantar and Alistarh, 2023a; Jaiswal et al., 2023; Sun et al., 2023) has shown the ability to reduce the number of parameters significantly, without compromising performance, resulting in notable savings in memory footprint, computing time, and energy consumption. Unlike pre-LLM pruning methods (Kurtic et al., 2022; Sanh et al., 2020), existing LLM pruning approaches typically allocate the "sparsity budget" (i.e., the number of pruned parameters or pruning ratios) uniformly across layers, making it difficult to increase sparsity to very high levels. Relatively little effort has been put into developing theoretically-principled ways to compute layerwise pruning ratios. For example, the Outlier Weighed Layerwise sparsity (OWL) method (Yin et al., 2023) uses a nonuniform layerwise sparsity based on the distribution of outlier activations. However, OWL relies on heuristics related to the presence of outliers (Dettmers et al., 2022; Kovaleva et al., 2021; Puccetti et al., 2022). This can lead to suboptimal performance in the absence of outliers, and this can make it difficult to achieve very aggressive levels of sparsity. For example, Yin et al. (2023) shows that pruning LLMs to 80% sparsity often significantly degrades the prediction performance of LLMs. First two authors contributed equally.

alphapruning, sparsity, technical report preprint, (13 more...)

arXiv.org Machine Learning

2410.10912

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback