AITopics | pruning mask

Collaborating Authors

pruning mask

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PDP: Parameter-free Differentiable Pruning is All You Need

Neural Information Processing SystemsOct-9-2025, 01:21:49 GMT

Hence, a desirable pruning algorithm should achieve high accuracy and accelerate inference for various types of networks without significant training overheads in costs and complexity.

machine learning, natural language, pruning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

StructPrune: Structured Global Pruning asymptotics with $\mathcal{O}(\sqrt{N})$ GPU Memory

Song, Xinyuan, Bai, Guangji, Zhao, Liang

arXiv.org Artificial IntelligenceOct-7-2025

Pruning is critical for scaling large language models (LLMs). Global pruning achieves strong performance but requires $\mathcal{O}(N)$ memory, which is infeasible for billion-parameter models. Local pruning reduces GPU memory usage to that of a single layer by pruning layers independently, but it neglects inter-layer dependencies and often leads to suboptimal performance in high-sparsity regimes. Unlike unstructured pruning, structured pruning produces regular sparsity patterns that align well with GPU kernels and library optimizations, making it more hardware-efficient. However, structured pruning typically relies on global pruning, since structured patterns are more prone to severe performance degradation under local optimization. To jointly achieve structured pruning and the memory efficiency of local pruning, we propose a divide-and-conquer strategy that decomposes the global pruning problem into coordinated subproblems across different modules, each of which fits within limited GPU memory. Building on this idea, we design \textbf{STRUPRUNE}, an ADMM-based framework that integrates structured sparsity into the pruning process, combining the memory efficiency of local pruning with the hardware compatibility of structured methods. We derive a closed-form analytical solution for structured pruning masks that provides an explicit rule for layer-wise sparsity allocation, and further develop an energy-based asymptotic framework yielding a softmax-form allocation scheme that simplifies optimization while adapting to heterogeneous layer importance. Experiments demonstrate that STRUPRUNE matches the perplexity of global structured pruning while reducing memory cost from $\mathcal{O}(N)$ to $\mathcal{O}(\sqrt{N})$, enabling practical deployment at the billion-parameter scale.

large language model, machine learning, pruning, (20 more...)

arXiv.org Artificial Intelligence

2510.03246

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Resource-Aware Neural Network Pruning Using Graph-based Reinforcement Learning

Balemans, Dieter, Huybrechts, Thomas, Steckel, Jan, Mercelis, Siegfried

arXiv.org Artificial IntelligenceSep-16-2025

This paper presents a novel approach to neural network pruning by integrating a graph-based observation space into an AutoML framework to address the limitations of existing methods. Traditional pruning approaches often depend on hand-crafted heuristics and local optimization perspectives, which can lead to suboptimal performance and inefficient pruning strategies. Our framework transforms the pruning process by introducing a graph representation of the target neural network that captures complete topological relationships between layers and channels, replacing the limited layer-wise observation space with a global view of network structure. The core innovations include a Graph Attention Network (GAT) encoder that processes the network's graph representation and generates a rich embedding. Additionally, for the action space we transition from continuous pruning ratios to fine-grained binary action spaces which enables the agent to learn optimal channel importance criteria directly from data, moving away from predefined scoring functions. These contributions are modelled within a Constrained Markov Decision Process (CMDP) framework, allowing the agent to make informed pruning decisions while adhering to resource constraints such as target compression rates. For this, we design a self-competition reward system that encourages the agent to outperform its previous best performance while satisfying the defined constraints. We demonstrate the effectiveness of our approach through extensive experiments on benchmark datasets including CIFAR-10, CIFAR-100, and ImageNet. The experiments show that our method consistently outperforms traditional pruning techniques, showing state-of-the-art results while learning task-specific pruning strategies that identify functionally redundant connections beyond simple weight magnitude considerations.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2509.10526

Country: Europe > Belgium (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression

Ilin, Ivan, Richtarik, Peter

arXiv.org Artificial IntelligenceApr-9-2025

This paper presents Thanos, a novel weight-pruning algorithm designed to reduce the memory footprint and enhance the computational efficiency of large language models (LLMs) by removing redundant weights while maintaining accuracy. Thanos introduces a block-wise pruning strategy with adaptive masks that dynamically adjust to weight importance, enabling flexible sparsity patterns and structured formats, such as $n:m$ sparsity, optimized for hardware acceleration. Experimental evaluations demonstrate that Thanos achieves state-of-the-art performance in structured pruning and outperforms existing methods in unstructured pruning. By providing an efficient and adaptable approach to model compression, Thanos offers a practical solution for deploying large models in resource-constrained environments.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2504.05346

Genre: Research Report > Promising Solution (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures

Qin, Jiayu, Tan, Jianchao, Zhang, Kefeng, Cai, Xunliang, Wang, Wei

arXiv.org Artificial IntelligenceFeb-19-2025

The remarkable performance of large language models (LLMs) in various language tasks has attracted considerable attention. However, the ever-increasing size of these models presents growing challenges for deployment and inference. Structured pruning, an effective model compression technique, is gaining increasing attention due to its ability to enhance inference efficiency. Nevertheless, most previous optimization-based structured pruning methods sacrifice the uniform structure across layers for greater flexibility to maintain performance. The heterogeneous structure hinders the effective utilization of off-the-shelf inference acceleration techniques and impedes efficient configuration for continued training. To address this issue, we propose a novel masking learning paradigm based on minimax optimization to obtain the uniform pruned structure by optimizing the masks under sparsity regularization. Extensive experimental results demonstrate that our method can maintain high performance while ensuring the uniformity of the pruned model structure, thereby outperforming existing SOTA methods.

arxiv preprint arxiv, language model, pruning, (14 more...)

arXiv.org Artificial Intelligence

2502.14008

Country:

North America > United States > Virginia (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Pruning Foundation Models for High Accuracy without Retraining

Zhao, Pu, Sun, Fei, Shen, Xuan, Yu, Pinrui, Kong, Zhenglun, Wang, Yanzhi, Lin, Xue

arXiv.org Artificial IntelligenceOct-20-2024

Despite the superior performance, it is challenging to deploy foundation models or large language models (LLMs) due to their massive parameters and computations. While pruning is a promising technique to reduce model size and accelerate the inference, the traditional pruning techniques can hardly be applied for LLMs as they need to finetune the model on the full dataset with multiple epochs consuming massive data and hardware resources. To deal with this problem, post-training pruning methods are proposed to prune LLMs in one-shot without retraining. However, their accuracy after pruning may suffer from certain performance degradation due to the lack of retraining with massive data. To address this issue, in this paper, we first formulate the post-training problem for layer-wise LLM compression to simultaneously prune multiple weights in LLMs. Next, we provide an optimal solution for this problem and design our post-training pruning algorithm for both unstructured and semi-structured sparsity. Our extensive experiments demonstrate the superior performance of the proposed methods in comparison to SOTA baselines across various LLM families including transformer-based LLMs and Mamba-based LLMs. Code link: https://github.com/piuzha/APT

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.15567

Country: North America > United States > New Jersey (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

FedMap: Iterative Magnitude-Based Pruning for Communication-Efficient Federated Learning

Herzog, Alexander, Southam, Robbie, Mavromatis, Ioannis, Khan, Aftab

arXiv.org Artificial IntelligenceJun-27-2024

Federated Learning (FL) is a distributed machine learning approach that enables training on decentralized data while preserving privacy. However, FL systems often involve resource-constrained client devices with limited computational power, memory, storage, and bandwidth. This paper introduces FedMap, a novel method that aims to enhance the communication efficiency of FL deployments by collaboratively learning an increasingly sparse global model through iterative, unstructured pruning. Importantly, FedMap trains a global model from scratch, unlike other methods reported in the literature, making it ideal for privacy-critical use cases such as in the medical and finance domains, where suitable pre-training data is often limited. FedMap adapts iterative magnitude-based pruning to the FL setting, ensuring all clients prune and refine the same subset of the global model parameters, therefore gradually reducing the global model size and communication overhead. The iterative nature of FedMap, forming subsequent models as subsets of predecessors, avoids parameter reactivation issues seen in prior work, resulting in stable performance. In this paper we provide an extensive evaluation of FedMap across diverse settings, datasets, model architectures, and hyperparameters, assessing performance in both IID and non-IID environments. Comparative analysis against the baseline approach demonstrates FedMap's ability to achieve more stable client model performance. For IID scenarios, FedMap achieves over $90$\% pruning without significant performance degradation. In non-IID settings, it achieves at least $~80$\% pruning while maintaining accuracy. FedMap offers a promising solution to alleviate communication bottlenecks in FL systems while retaining model accuracy.

fedmap, global model, pruning, (13 more...)

arXiv.org Artificial Intelligence

2406.1905

Country:

Asia > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Promising Solution (0.68)
Research Report > Experimental Study (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Xie, Jiamin, Li, Ke, Guo, Jinxi, Tjandra, Andros, Shangguan, Yuan, Sari, Leda, Wu, Chunyang, Jia, Junteng, Mahadeokar, Jay, Kalinli, Ozlem

arXiv.org Artificial IntelligenceJan-11-2024

Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways). Our approach dynamically adapts the sub-network, avoiding premature decisions about a fixed sub-network structure. We show that our approach outperforms existing pruning methods when targeting sparse monolingual models. Further, we illustrate that Dynamic ASR Pathways jointly discovers and trains better sub-networks (pathways) of a single multilingual model by adapting from different sub-network initializations, thereby reducing the need for language-specific pruning.

multilingual asr model, pruning, pruning mask, (11 more...)

arXiv.org Artificial Intelligence

2309.13018

Country: North America > United States > Texas (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.57)

Add feedback

Breaking through Deterministic Barriers: Randomized Pruning Mask Generation and Selection

Li, Jianwei, Gao, Weizhi, Lei, Qi, Xu, Dongkuan

arXiv.org Artificial IntelligenceJan-10-2024

It is widely acknowledged that large and sparse models have higher accuracy than small and dense models under the same model size constraints. This motivates us to train a large model and then remove its redundant neurons or weights by pruning. Most existing works pruned the networks in a deterministic way, the performance of which solely depends on a single pruning criterion and thus lacks variety. Instead, in this paper, we propose a model pruning strategy that first generates several pruning masks in a designed random way. Subsequently, along with an effective mask-selection rule, the optimal mask is chosen from the pool of mask candidates. To further enhance efficiency, we introduce an early mask evaluation strategy, mitigating the overhead associated with training multiple masks. Our extensive experiments demonstrate that this approach achieves state-of-the-art performance across eight datasets from GLUE, particularly excelling at high levels of sparsity.

computational linguistic, pruning, randomness, (15 more...)

arXiv.org Artificial Intelligence

2310.13183

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > North Carolina (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(11 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

pruning mask

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

8f9f4eb32b9081a90f2a0b2627eb2a24-Paper-Conference.pdf

PDP: Parameter-free Differentiable Pruning is All You Need

StructPrune: Structured Global Pruning asymptotics with $\mathcal{O}(\sqrt{N})$ GPU Memory

Resource-Aware Neural Network Pruning Using Graph-based Reinforcement Learning

Thanos: A Block-wise Pruning Algorithm for Efficient Large Language Model Compression

MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures

Pruning Foundation Models for High Accuracy without Retraining

FedMap: Iterative Magnitude-Based Pruning for Communication-Efficient Federated Learning

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Breaking through Deterministic Barriers: Randomized Pruning Mask Generation and Selection