AITopics | blockpruner

Collaborating Authors

blockpruner

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accurate Sublayer Pruning for Large Language Models by Exploiting Latency and Tunability Information

Park, Seungcheol, Lee, Sojin, Kim, Jongjin, Lee, Jinsik, Jo, Hyunjik, Kang, U

arXiv.org Artificial IntelligenceJun-5-2025

How can we accelerate large language models(LLMs) without sacrificing accuracy? The slow inference speed of LLMs hinders us to benefit from their remarkable performance in diverse applications. This is mainly because numerous sublayers are stacked together in LLMs. Sublayer pruning compresses and expedites LLMs via removing unnecessary sublayers. However, existing sublayer pruning algorithms are limited in accuracy since they naively select sublayers to prune, overlooking the different characteristics of each sublayer. In this paper, we propose SPRINT (Sublayer PRuning wIth LateNcy and Tunability Information), an accurate sublayer pruning method for LLMs. SPRINT accurately selects a target sublayer to prune by considering 1) the amount of latency reduction after pruning and 2) the tunability of sublayers. SPRINT iteratively prunes redundant sublayers and swiftly tunes the parameters of remaining sublayers. Experiments show that SPRINT achieves the best accuracy-speedup trade-off, exhibiting up to 23.88%p higher accuracy on zero-shot commonsense reasoning benchmarks compared to existing pruning algorithms.

large language model, machine learning, sublayer, (19 more...)

arXiv.org Artificial Intelligence

2506.0351

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

MultiPruner: Balanced Structure Removal in Foundation Models

Muñoz, J. Pablo, Yuan, Jinjie, Jain, Nilesh

arXiv.org Artificial IntelligenceJan-16-2025

Recently, state-of-the-art approaches for pruning large pre-trained models (LPMs) have demonstrated that the training-free removal of non-critical residual blocks in Transformers is viable for reducing model size, achieving results that outperform previous training-free pruning approaches. Motivated by these findings, we extend BlockPruner (Zhong et al., 2024) and propose MultiPruner, a pruning approach that surpasses recent training-free pruning methods by adopting a multidimensional, iterative, fine-grained pruning strategy. In MultiPruner, multidimensional pruning reinstates the structural balance in block-pruned models by sequentially compressing along three dimensions: i) residual blocks, ii) channels of multilayer perceptrons (MLP), and iii) attention heads. This solution enhances zero-shot accuracy on downstream tasks compared to other techniques while improving model compression ratios, producing compressed models with fewer computing and memory requirements. Extensive experiments demonstrate the advantages of the proposed method across various large pre-trained models. The code and pruning configurations are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.

large language model, machine learning, pruning, (18 more...)

arXiv.org Artificial Intelligence

2501.09949

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Dominican Republic (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

BlockPruner: Fine-grained Pruning for Large Language Models

Zhong, Longguang, Wan, Fanqi, Chen, Ruijun, Quan, Xiaojun, Li, Liangzhi

arXiv.org Artificial IntelligenceJun-20-2024

With the rapid growth in the size and complexity of large language models (LLMs), the costs associated with their training and inference have escalated significantly. Research indicates that certain layers in LLMs harbor substantial redundancy, and pruning these layers has minimal impact on the overall performance. While various layer pruning methods have been developed based on this insight, they generally overlook the finer-grained redundancies within the layers themselves. In this paper, we delve deeper into the architecture of LLMs and demonstrate that finer-grained Figure 1: Block Influence (BI) scores (Men et al., 2024) pruning can be achieved by targeting redundancies for the Llama2-7B model (Touvron et al., 2023b) computed in multi-head attention (MHA) and at both layer and block levels, where blocks/layers multi-layer perceptron (MLP) blocks. We propose with lower BI scores indicate less importance. The a novel, training-free structured pruning model has 32 Transformer layers, each containing one approach called BlockPruner. Unlike existing MHA and one MLP block, totaling 64 blocks. Blocklevel layer pruning methods, BlockPruner segments BI scores are generally lower than layer-level each Transformer layer into MHA and scores, indicating finer-grained redundancies.

blockpruner, pruning, pruning ratio, (15 more...)

arXiv.org Artificial Intelligence

2406.10594

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)

Add feedback