IteRABRe: Iterative Recovery-Aided Block Reduction

Wibowo, Haryo Akbarianto, Song, Haiyue, Tanaka, Hideki, Utiyama, Masao, Aji, Alham Fikri, Dabre, Raj

Mar-8-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have grown increasingly expensive to deploy, driving the need for effective model compression techniques. While block pruning offers a straightforward approach to reducing model size, existing methods often struggle to maintain performance or require substantial computational resources for recovery. We present IteRABRe, a simple yet effective iterative pruning method that achieves superior compression results while requiring minimal computational resources. Using only 2.5M tokens for recovery, our method outperforms baseline approaches by ~3% on average when compressing the Llama3.1-8B and Qwen2.5-7B models. IteRABRe demonstrates particular strength in the preservation of linguistic capabilities, showing an improvement 5% over the baselines in language-related tasks. Our analysis reveals distinct pruning characteristics between these models, while also demonstrating preservation of multilingual capabilities.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

Mar-8-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.46)
- Europe (0.46)
- North America > United States (0.28)

Genre:
- Research Report (1.00)

Industry:
- Transportation (0.40)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)