SparseLLM: Towards Global Pruning of Pre-trained Language Models 2 Chen Ling

May-24-2025, 09:33:31 GMT–Neural Information Processing Systems

The transformative impact of large language models (LLMs) like LLaMA and GPT on natural language processing is countered by their prohibitive computational demands. Pruning has emerged as a pivotal compression strategy, introducing sparsity to enhance both memory and computational efficiency. Yet, traditional global pruning is impractical for LLMs due to scalability issues, while local pruning, despite its efficiency, leads to suboptimal solutions. Addressing these challenges, we propose SparseLLM, a novel framework that redefines the global pruning process into manageable, coordinated subproblems, allowing for resource-efficient optimization with global optimality. SparseLLM's approach, which conceptualizes LLMs as a chain of modular functions and leverages auxiliary variables for problem decomposition, not only facilitates a pragmatic application on LLMs but also demonstrates significant performance improvements, particularly in high-sparsity regimes, surpassing current state-of-the-art methods.

large language model, machine learning, pruning, (18 more...)

Neural Information Processing Systems

May-24-2025, 09:33:31 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Government > Regional Government (0.46)
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)