AITopics | pruning method

Collaborating Authors

pruning method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Discovering Important Experts for Mixture-of-Experts Models Pruning Through a Theoretical Perspective

Neural Information Processing SystemsJun-22-2026, 17:26:50 GMT

Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models but face prohibitive memory demands due to massive parameterization. Existing pruning methods rely on heuristic metrics or impractical enumeration of expert subsets, leading to suboptimal performance or scalability. In this paper, we propose Shapley-MoE, an efficient pruning method for MoE models inspired by cooperative game theory. By quantifying each expert's contribution via Shapley value, our method identifies important experts without exhaustive combination evaluations. To overcome the NP-hard complexity of exact Shapley computation, we introduce a Monte Carlo sampling strategy for efficient approximation that reduces complexity to quadratic time. However, vanilla Monte Carlo sampling still faces issues of insufficient estimation accuracy and low sampling efficiency.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Energy (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

Add feedback

The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis

Neural Information Processing SystemsJun-17-2026, 03:32:55 GMT

Sparse neural networks promise efficiency, yet training them effectively remains a fundamental challenge. Despite advances in pruning methods that create sparse architectures, understanding why some sparse structures are better trainable than others with the same level of sparsity remains poorly understood. Aiming to develop a systematic approach to this fundamental problem, we propose a novel theoretical framework based on the theory of graph limits, particularly graphons, that characterizes sparse neural networks in the infinite-width regime. Our key insight is that connectivity patterns of sparse neural networks induced by pruning methods converge to specific graphons as networks' width tends to infinity, which encodes implicit structural biases of different pruning methods. We postulate the Graphon Limit Hypothesis and provide empirical evidence to support it. Leveraging this graphon representation, we derive a Graphon Neural Tangent Kernel (Graphon NTK) to study the training dynamics of sparse networks in the infinite width limit. Graphon NTK provides a general framework for the theoretical analysis of sparse networks. We empirically show that the spectral analysis of Graphon NTK correlates with observed training dynamics of sparse networks, explaining the varying convergence behaviours of different pruning methods. Our framework provides theoretical insights into the impact of connectivity patterns on the trainability of various sparse network architectures.

artificial intelligence, machine learning, neural network, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Attribution-Driven Adaptive Token Pruning for Transformers

Neural Information Processing SystemsJun-17-2026, 02:26:39 GMT

Transformers have been widely adopted in natural language processing, computer vision, and other domains due to their exceptional performance across a variety of tasks. However, the computational cost of Transformers is prohibitively high, particularly when handling long input sequences, significantly increasing both training and inference time. Although various token pruning methods have been proposed to reduce the computational burden of Transformers, most approaches overlook critical differences in sequences in terms of length and complexity, leading to suboptimal compression efficiency. In this paper, we propose AD-TP, an Attribution-Driven Adaptive Token Pruning method designed to retain only the most informative tokens. We analyze the performance of using accumulated attention values to measure token importance and find that attention values do not accurately reflect the actual contribution of each token to text understanding.

contribution, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs

Neural Information Processing SystemsJun-15-2026, 16:40:50 GMT

In multimodal large language models (MLLMs), the length of input visual tokens is often significantly greater than that of their textual counterparts, leading to a high inference cost. Many works aim to address this issue by removing redundant visual tokens. However, current approaches either rely on attention-based pruning, which retains numerous duplicate tokens, or use similarity-based pruning, overlooking the instruction relevance, consequently causing suboptimal performance. In this paper, we go beyond attention or similarity by proposing a novel visual token pruning method named CDPruner, which maximizes the conditional diversity of retained tokens. We first define the conditional similarity between visual tokens conditioned on the instruction, and then reformulate the token pruning problem with determinantal point process (DPP) to maximize the conditional diversity of the selected subset. The proposed CDPruner is training-free and model-agnostic, allowing easy application to various MLLMs. Extensive experiments across diverse MLLMs show that CDPruner establishes new state-of-the-art on various visionlanguage benchmarks. By maximizing conditional diversity through DPP, the selected subset better represents the input images while closely adhering to user instructions, thereby preserving strong performance even with high reduction ratios. When applied to LLaVA, CDPruner reduces FLOPs by 95% and CUDA latency by 78%, while maintaining 94% of the original accuracy.

cdpruner, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe (0.45)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Discovering Important Experts for Mixture-of-Experts Models Pruning Through a Theoretical Perspective

Neural Information Processing SystemsJun-14-2026, 03:43:54 GMT

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (0.98)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.39)

Add feedback

DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning

Neural Information Processing SystemsJun-12-2026, 04:17:49 GMT

Despite the significant breakthrough of Mixture-of-Experts (MoE), the increasing scale of these MoE models presents huge memory and storage challenges. Existing MoE pruning methods, which involve reducing parameter size with a uniform sparsity across all layers, often lead to suboptimal outcomes and performance degradation due to varying expert redundancy in different MoE layers. To address this, we propose a non-uniform pruning strategy, dubbed Differentiable Expert Pruning (DiEP), which adaptively adjusts pruning rates at the layer level while jointly learning inter-layer importance, effectively capturing the varying redundancy across different MoE layers. By transforming the global discrete search space into a continuous one, our method handles exponentially growing non-uniform expert combinations, enabling adaptive gradient-based pruning. Extensive experiments on five advanced MoE models demonstrate the efficacy of our method across various NLP tasks. Notably, \textbf{DiEP} retains around 92\% of original performance on Mixtral 8$\times$7B with only half the experts, outperforming other pruning methods by up to 7.1% on the challenging MMLU dataset.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis

Neural Information Processing SystemsJun-12-2026, 02:28:54 GMT

Sparse neural networks promise efficiency, yet training them effectively remains a fundamental challenge. Despite advances in pruning methods that create sparse architectures, understanding why some sparse structures are better trainable than others with the same level of sparsity remains poorly understood. Aiming to develop a systematic approach to this fundamental problem, we propose a novel theoretical framework based on the theory of graph limits, particularly graphons, that characterizes sparse neural networks in the infinite-width regime. Our key insight is that connectivity patterns of sparse neural networks induced by pruning methods converge to specific graphons as networks' width tends to infinity, which encodes implicit structural biases of different pruning methods. We postulate the and provide empirical evidence to support it. Leveraging this graphon representation, we derive a to study the training dynamics of sparse networks in the infinite width limit. Graphon NTK provides a general framework for the theoretical analysis of sparse networks. We empirically show that the spectral analysis of Graphon NTK correlates with observed training dynamics of sparse networks, explaining the varying convergence behaviours of different pruning methods. Our framework provides theoretical insights into the impact of connectivity patterns on the trainability of various sparse network architectures.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.76)

Add feedback

ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization

Neural Information Processing SystemsJun-10-2026, 22:24:46 GMT

We introduce ReplaceMe, a generalized training-free depth pruning method that effectively replaces transformer blocks with a linear operation, while maintaining high performance for low compression ratios. In contrast to conventional pruning approaches that require additional training or fine-tuning, our approach requires only a small calibration dataset that is used to estimate a linear transformation, which approximates the pruned blocks. The estimated linear mapping can be seam-lessly merged with the remaining transformer blocks, eliminating the need for any additional network parameters. Our experiments show that ReplaceMe consistently outperforms other training-free approaches and remains highly competitive with state-of-the-art pruning methods that involve extensive retraining/fine-tuning and architectural modifications. Applied to several large language models (LLMs), ReplaceMe achieves up to 25% pruning while retaining approximately 90% of the original model's performance on open benchmarks--without any training or healing steps, resulting in minimal computational overhead.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback

TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts

He, Jiangyang, Zhu, Shaolin, Xiong, Deyi

arXiv.org Machine LearningJun-10-2026

Mixture-of-Experts large language models (LLMs) scale efficiently through sparse activation, yet their deployment is fundamentally constrained by the large static parameter footprint of experts. Existing compression approaches either remove entire experts, disrupting routing topology and harming performance, or rely on unstructured weight pruning with limited practical efficiency. To address the limitations, we propose TENP, a structured Trapezoidal ExpertNeuron Pruning framework. Using a few samples, we identify and retain important experts, while applying expert neuron pruning (ENP) to less important experts, reserving model parameters in a trapezoidal pattern from shallow to deep layers. When evaluating expert importance, we jointly consider both the magnitude of the expert output and its ability to change the direction of the input vector. For ENP, we measure each neuron's projected contribution to the expert output to identify and retain important neurons. We conduct extensive experiments on the Qwen and DeepSeek models. Under a routing expert sparsity of 40% and an average of 63.76% activated expert parameters, the DeepSeek model suffers only a 1-point drop in accuracy compared to the full-parameter model. Moreover, it outperforms the full-parameter model by 10% on code generation tasks.

large language model, machine learning, pruning, (20 more...)

arXiv.org Machine Learning

2606.09885

Country: