AITopics

Country:

North America > United States > New Jersey (0.04)
North America > United States > Illinois > Cook County > Lemont (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Government > Regional Government (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Neural Information Processing SystemsFeb-8-2026, 02:16:25 GMT

10fc83943b4540a9524af6fc67a23fef-Paper-Conference.pdf

alphapruning, pl alpha hill, sparsity, (11 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsFeb-7-2026, 22:34:59 GMT

0e9a05f5ce62284c91e4a33498899124-Paper-Conference.pdf

arxiv preprint arxiv, dataset, sparsity, (14 more...)

Country:

Asia > Singapore (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Yin, Ruokai, Li, Yuhang, Lee, Donghyun, Panda, Priyadarshini

DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs

arXiv.org Artificial IntelligenceNov-14-2025

Large language models (LLMs) deliver strong performance but are difficult to deploy due to high memory and compute costs. While pruning reduces these demands, most methods ignore activation sparsity observed at runtime. We reinterpret activation sparsity as dynamic structured weight sparsity and propose DuoGPT, a unified framework that constructs dual-sparse (spMspV) workloads by combining unstructured weight pruning with activation sparsity. To preserve accuracy, we extend the Optimal Brain Compression (OBC) framework with activation-aware calibration and introduce output residuals from the dense model as correction terms. We further optimize the solution for efficient GPU execution, enabling scalability to billion-parameter LLMs. Evaluations on LLaMA-2 and LLaMA-3 show that DuoGPT outperforms state-of-the-art structured pruning methods by up to 9.17% accuracy at an iso-speedup of 1.39$\times$ compared to the baseline dense model. Code is available at Github.

large language model, machine learning, sparsity, (18 more...)

2506.20194

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 02:36:06 GMT

SparseLLM: Towards Global Pruning of Pre-trained Language Models

Pruning has emerged as a pivotal compression strategy, introducing sparsity to enhance both memory and computational efficiency.

arxiv preprint arxiv, pruning, sparsellm, (14 more...)

Country:

North America > United States > New Jersey (0.04)
North America > United States > Illinois > Cook County > Lemont (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Government > Regional Government (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Neural Information Processing SystemsOct-9-2025, 18:49:42 GMT

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

We have open-sourced our code.

alphapruning, pl alpha hill, sparsity, (12 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Neural Information Processing SystemsOct-9-2025, 18:37:20 GMT

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Large Language Models (LLMs) are distinguished by their massive parameter counts, which typically result in significant redundancy.

arxiv preprint arxiv, dataset, sparsity, (12 more...)

Country:

Asia > Singapore (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

arXiv.org Artificial IntelligenceSep-17-2025

Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction

Lucas, Ryan, Behdin, Kayhan, Wang, Zhipeng, Song, Qingquan, Tang, Shao, Mazumder, Rahul

Reasoning language models such as DeepSeek-R1 produce long chain-of-thought traces during inference time which make them costly to deploy at scale. We show that using compression techniques such as neural network pruning produces greater performance loss than in typical language modeling tasks, and in some cases can make the model slower since they cause the model to produce more thinking tokens but with worse performance. We show that this is partly due to the fact that standard LLM pruning methods often focus on input reconstruction, whereas reasoning is a decode-dominated task. We introduce a simple, drop-in fix: during pruning we jointly reconstruct activations from the input and the model's on-policy chain-of-thought traces. This "Reasoning-Aware Compression" (RAC) integrates seamlessly into existing pruning workflows such as SparseGPT, and boosts their performance significantly. Code reproducing the results in the paper can be found at: https://github.com/RyanLucas3/RAC

calibration, large language model, machine learning, (18 more...)

2509.12464

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

arXiv.org Artificial IntelligenceJun-3-2025

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

Dong, Peijie, Tang, Zhenheng, Liu, Xiang, Li, Lujun, Chu, Xiaowen, Li, Bo

Post-training compression reduces the computational and memory costs of large language models (LLMs), enabling resource-efficient deployment. However, existing compression benchmarks only focus on language modeling (e.g., perplexity) and natural language understanding tasks (e.g., GLUE accuracy), ignoring the agentic capabilities - workflow, tool use/function call, long-context understanding and real-world application. We introduce the Agent Compression Benchmark (ACBench), the first comprehensive benchmark for evaluating how compression impacts LLMs' agentic abilities. ACBench spans (1) 12 tasks across 4 capabilities (e.g., WorfBench for workflow generation, Needle-in-Haystack for long-context retrieval), (2) quantization (GPTQ, AWQ) and pruning (Wanda, SparseGPT), and (3) 15 models, including small (Gemma-2B), standard (Qwen2.5 7B-32B), and distilled reasoning LLMs (DeepSeek-R1-Distill). Our experiments reveal compression tradeoffs: 4-bit quantization preserves workflow generation and tool use (1%-3% drop) but degrades real-world application accuracy by 10%-15%. We introduce ERank, Top-k Ranking Correlation and Energy to systematize analysis. ACBench provides actionable insights for optimizing LLM compression in agentic scenarios. The code can be found in https://github.com/pprp/ACBench.

large language model, machine learning, natural language, (18 more...)

2505.19433

Country:

North America > United States (0.45)
Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Workflow (0.89)

Industry:

Energy (0.67)
Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-21-2025

Improved Methods for Model Pruning and Knowledge Distillation

Jiang, Wei, Fu, Anying, Zhang, Youling

Model pruning is a performance optimization technique for large language models like R1 or o3-mini. However, existing pruning methods often lead to significant performance degradation or require extensive retraining and fine-tuning. This technique aims to identify and remove neurons, connections unlikely leading to the contribution during the human-computer interaction phase. Our goal is to obtain a much smaller and faster knowledge distilled model that can quickly generate content almost as good as those of the unpruned ones. We propose MAMA Pruning, short for Movement and Magnitude Analysis, an improved pruning method that effectively reduces model size and computational complexity while maintaining performance comparable to the original unpruned model even at extreme pruned levels. The improved method is based on weights, bias fixed in the pre-training phase and GRPO rewards verified during the post-training phase as our novel pruning indicators. Preliminary experimental results show that our method outperforms and be comparable to state-of-the-art methods across various pruning levels and different downstream computational linguistics tasks.

large language model, machine learning, natural language, (16 more...)

2505.14052

Country:

North America > United States > Hawaii (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)