Activation-Informed Pareto-Guided Low-Rank Compression for Efficient LLM/VLM

Solgi, Ryan, Madinei, Parsa, Tian, Jiayi, Swaminathan, Rupak, Liu, Jing, Susanj, Nathan, Zhang, Zheng

Oct-8-2025–arXiv.org Artificial Intelligence

Large language models (LLM) and vision-language models (VLM) have achieved state-of-the-art performance, but they impose significant memory and computing challenges in deployment. We present a novel low-rank compression framework to address this challenge. First, we upper bound the change of network loss via layer-wise activation-based compression errors, filling a theoretical gap in the literature. We then formulate low-rank model compression as a bi-objective optimization and prove that a single uniform tolerance yields surrogate Pareto-optimal heterogeneous ranks. Based on our theoretical insights, we propose Pareto-Guided Singular Value Decomposition (PGSVD), a zero-shot pipeline that improves activation-aware compression via Pareto-guided rank selection and alternating least-squares implementation. We apply PGSVD to both LLM and VLM, showing better accuracy at the same compression levels and inference speedup.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

Oct-8-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report (0.50)

Industry:
- Energy (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)