Streamlining Redundant Layers to Compress Large Language Models

Chen, Xiaodong, Hu, Yuxuan, Zhang, Jing, Wang, Yanling, Li, Cuiping, Chen, Hong

May-22-2024–arXiv.org Artificial Intelligence

This paper introduces LLM-Streamline, a novel layer pruning approach for large language models. It is based on the observation that different layers have varying impacts on hidden states, enabling the identification of less important layers. LLM-Streamline comprises two parts: layer pruning, which removes consecutive layers with the lowest importance based on target sparsity, and layer replacement, where a lightweight network is trained to replace the pruned layers to mitigate performance loss. Additionally, a new metric called "stability" is proposed to address the limitations of accuracy in evaluating model compression. Experiments show that LLM-Streamline surpasses previous state-of-the-art pruning methods in both accuracy and stability.

arxiv preprint arxiv, benchmark, lightweight network, (11 more...)

arXiv.org Artificial Intelligence

May-22-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China > Shaanxi Province > Xi'an (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found