AITopics | pruning criterion

Collaborating Authors

pruning criterion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Finding significant combinations of features in the presence of categorical covariates

Laetitia Papaxanthos, Felipe Llinares-López, Dean Bodenham, Karsten Borgwardt

Neural Information Processing SystemsMar-23-2026, 00:51:33 GMT

In high-dimensional settings, where the number of features pis much larger than the number of samples n, methods that systematically examine arbitrary combinations of features have only recently begun to be explored. However, none of the current methods is able to assess the association between feature combinations and a target variable while conditioning on a categorical covariate. As a result, many false discoveries might occur due to unaccounted confounding effects. We propose the Fast Automatic Conditional Search (FACS) algorithm, a significant discriminative itemset mining method which conditions on categorical covariates and only scales as O(klog k), where k is the number of states of the categorical covariate. Based on the Cochran-Mantel-Haenszel Test, FACS demonstrates superior speed and statistical power on simulated and real-world datasets compared to the state of the art, opening the door to numerous applications in biomedicine.

artificial intelligence, criterion, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.31)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

2c572cad9ae98c5cb6f3fca040b2bc54-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 21:58:16 GMT

approximation, pruning, sparsity, (13 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Wisconsin (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
(4 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Pruning for Sparse Diffusion Models based on Gradient Flow

Wan, Ben, Zheng, Tianyi, Chen, Zhaoyu, Wang, Yuxiao, Wang, Jia

arXiv.org Artificial IntelligenceJan-16-2025

Diffusion Models (DMs) have impressive capabilities among generation models, but are limited to slower inference speeds and higher computational costs. Previous works utilize one-shot structure pruning to derive lightweight DMs from pre-trained ones, but this approach often leads to a significant drop in generation quality and may result in the removal of crucial weights. Thus we propose a iterative pruning method based on gradient flow, including the gradient flow pruning process and the gradient flow pruning criterion. We employ a progressive soft pruning strategy to maintain the continuity of the mask matrix and guide it along the gradient flow of the energy function based on the pruning criterion in sparse space, thereby avoiding the sudden information loss typically caused by one-shot pruning. Gradient-flow based criterion prune parameters whose removal increases the gradient norm of loss function and can enable fast convergence for a pruned model in iterative pruning stage. Our extensive experiments on widely used datasets demonstrate that our method achieves superior performance in efficiency and consistency with pre-trained models.

arxiv preprint arxiv, gradient flow, pruning, (13 more...)

arXiv.org Artificial Intelligence

2501.09464

Country:

Asia > China > Shanghai > Shanghai (0.06)
Europe > Switzerland (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Layer-Adaptive State Pruning for Deep State Space Models

Gwak, Minseon, Moon, Seongrok, Ko, Joohwan, Park, PooGyeon

arXiv.org Artificial IntelligenceJan-14-2025

Due to the lack of state dimension optimization methods, deep state space models (SSMs) have sacrificed model capacity, training search space, or stability to alleviate computational costs caused by high state dimensions. In this work, we provide a structured pruning method for SSMs, Layer-Adaptive STate pruning (LAST), which reduces the state dimension of each layer in minimizing model-level output energy loss by extending modal truncation for a single system. LAST scores are evaluated using the $\mathcal{H}_{\infty}$ norms of subsystems and layer-wise energy normalization. The scores serve as global pruning criteria, enabling cross-layer comparison of states and layer-adaptive pruning. Across various sequence benchmarks, LAST optimizes previous SSMs, revealing the redundancy and compressibility of their state spaces. Notably, we demonstrate that, on average, pruning 33% of states still maintains performance with 0.52% accuracy loss in multi-input multi-output SSMs without retraining. Code is available at https://github.com/msgwak/LAST.

pruning, pruning ratio, state dimension, (13 more...)

arXiv.org Artificial Intelligence

2411.02824

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models

Cheng, Hongrong, Zhang, Miao, Shi, Javen Qinfeng

arXiv.org Artificial IntelligenceJul-16-2024

As Large Language Models (LLMs) grow dramatically in size, there is an increasing trend in compressing and speeding up these models. Previous studies have highlighted the usefulness of gradients for importance scoring in neural network compressing, especially in pruning medium-size networks. However, the substantial memory requirements involved in calculating gradients with backpropagation impede the utilization of gradients in guiding LLM pruning. As a result, most pruning strategies for LLMs rely on gradient-free criteria, such as weight magnitudes or a mix of magnitudes and activations. In this paper, we devise a hybrid pruning criterion, which appropriately integrates magnitude, activation, and gradient to capitalize on feature map sensitivity for pruning LLMs. To overcome memory requirement barriers, we estimate gradients using only forward passes. Based on this, we propose a Memory-effIcieNt structured prunIng procedure for LLMs (MINI-LLM) to remove no-critical channels and multi-attention heads. Experimental results demonstrate the superior performance of MINI-LLM over existing gradient-free methods on three LLMs: LLaMA, BLOOM, and OPT across various downstream tasks (classification, multiple-choice, and generation), while MINI-LLM maintains a GPU memory footprint akin to gradient-free methods.

gradient, mini-llm, pruning, (14 more...)

arXiv.org Artificial Intelligence

2407.11681

Country:

North America > United States (0.14)
Oceania > Australia > South Australia > Adelaide (0.04)
Asia > India (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Finding significant combinations of features in the presence of categorical covariates

Neural Information Processing SystemsMar-12-2024, 07:30:32 GMT

In high-dimensional settings, where the number of features p is much larger than the number of samples n, methods that systematically examine arbitrary combinations of features have only recently begun to be explored. However, none of the current methods is able to assess the association between feature combinations and a target variable while conditioning on a categorical covariate. As a result, many false discoveries might occur due to unaccounted confounding effects. We propose the Fast Automatic Conditional Search (FACS) algorithm, a significant discriminative itemset mining method which conditions on categorical covariates and only scales as O(k log k), where k is the number of states of the categorical covariate. Based on the Cochran-Mantel-Haenszel Test, FACS demonstrates superior speed and statistical power on simulated and real-world datasets compared to the state of the art, opening the door to numerous applications in biomedicine.

covariate, criterion, testability criterion, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > Experimental Study (0.30)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood

Dhahri, Rayen, Immer, Alexander, Charpentier, Betrand, Günnemann, Stephan, Fortuin, Vincent

arXiv.org Machine LearningFeb-24-2024

Neural network sparsification is a promising avenue to save computational time and memory costs, especially in an age where many successful AI models are becoming too large to na\"ively deploy on consumer hardware. While much work has focused on different weight pruning criteria, the overall sparsifiability of the network, i.e., its capacity to be pruned without quality loss, has often been overlooked. We present Sparsifiability via the Marginal likelihood (SpaM), a pruning framework that highlights the effectiveness of using the Bayesian marginal likelihood in conjunction with sparsity-inducing priors for making neural networks more sparsifiable. Our approach implements an automatic Occam's razor that selects the most sparsifiable model that still explains the data well, both for structured and unstructured sparsification. In addition, we demonstrate that the pre-computed posterior Hessian approximation used in the Laplace approximation can be re-used to define a cheap pruning criterion, which outperforms many existing (more expensive) approaches. We demonstrate the effectiveness of our framework, especially at high sparsity levels, across a range of different neural network architectures and datasets.

approximation, pruning, sparsity, (11 more...)

arXiv.org Machine Learning

2402.15978

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
North America > United States > Wisconsin (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Cup Curriculum: Curriculum Learning on Model Capacity

Scharr, Luca, Toborek, Vanessa

arXiv.org Artificial IntelligenceNov-7-2023

Curriculum learning (CL) aims to increase the performance of a learner on a given task by applying a specialized learning strategy. This strategy focuses on either the dataset, the task, or the model. There is little to no work analysing the possibilities to apply CL on the model capacity in natural language processing. To close this gap, we propose the cup curriculum. In a first phase of training we use a variation of iterative magnitude pruning to reduce model capacity. These weights are reintroduced in a second phase, resulting in the model capacity to show a cup-shaped curve over the training iterations. We empirically evaluate different strategies of the cup curriculum and show that it outperforms early stopping reliably while exhibiting a high resilience to overfitting.

cup curriculum, curriculum, model capacity, (14 more...)

arXiv.org Artificial Intelligence

2311.03956

Country: Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

Yang, Haichuan, Shangguan, Yuan, Wang, Dilin, Li, Meng, Chuang, Pierce, Zhang, Xiaohui, Venkatesh, Ganesh, Kalinli, Ozlem, Chandra, Vikas

arXiv.org Artificial IntelligenceJul-20-2022

From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets. To navigate the Pareto front of model accuracy vs model size, researchers are trapped in a dilemma of optimizing model accuracy by training and fine-tuning models for each individual edge device while keeping the training GPU-hours tractable. In this paper, we propose Omni-sparsity DNN, where a single neural network can be pruned to generate optimized model for a large range of model sizes. We develop training strategies for Omni-sparsity DNN that allows it to find models along the Pareto front of word-error-rate (WER) vs model size while keeping the training GPU-hours to no more than that of training one singular model. We demonstrate the Omni-sparsity DNN with streaming E2E ASR models. Our results show great saving on training time and resources with similar or better accuracy on LibriSpeech compared to individually pruned sparse models: 2%-6.6% better WER on Test-other.

omni-sparsity dnn, sparse model, supernet, (12 more...)

arXiv.org Artificial Intelligence

2110.08352

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Blending Pruning Criteria for Convolutional Neural Networks

He, Wei, Huang, Zhongzhan, Liang, Mingfu, Liang, Senwei, Yang, Haizhao

arXiv.org Artificial IntelligenceJul-11-2021

The advancement of convolutional neural networks (CNNs) on various vision applications has attracted lots of attention. Yet the majority of CNNs are unable to satisfy the strict requirement for real-world deployment. To overcome this, the recent popular network pruning is an effective method to reduce the redundancy of the models. However, the ranking of filters according to their "importance" on different pruning criteria may be inconsistent. One filter could be important according to a certain criterion, while it is unnecessary according to another one, which indicates that each criterion is only a partial view of the comprehensive "importance". From this motivation, we propose a novel framework to integrate the existing filter pruning criteria by exploring the criteria diversity. The proposed framework contains two stages: Criteria Clustering and Filters Importance Calibration. First, we condense the pruning criteria via layerwise clustering based on the rank of "importance" score. Second, within each cluster, we propose a calibration factor to adjust their significance for each selected blending candidates and search for the optimal blending criterion via Evolutionary Algorithm. Quantitative results on the CIFAR-100 and ImageNet benchmarks show that our framework outperforms the state-of-the-art baselines, regrading to the compact model performance after pruning.

criteria, criterion, pruning criteria, (12 more...)

arXiv.org Artificial Intelligence

2107.05033

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback