ESPACE: Dimensionality Reduction of Activations for Model Compression

May-26-2025, 18:16:40 GMT–Neural Information Processing Systems

We propose ESPACE, an LLM compression technique based on dimensionality reduction of activations. Unlike prior works on weight-centric tensor decomposition, ESPACE projects activations onto a pre-calibrated set of principal components. The activation-centrality of the approach enables retraining LLMs with no loss of expressivity; while at inference, weight decomposition is obtained as a byproduct of matrix multiplication associativity. Theoretical results on the construction of projection matrices with optimal computational accuracy are provided. Experimentally, we find ESPACE enables 50% compression of GPT3, Llama2, and Nemotron4 models with small accuracy degradation, as low as a 0.18 perplexity increase on GPT3-22B.

large language model, machine learning, natural language, (7 more...)

Neural Information Processing Systems

May-26-2025, 18:16:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.74)
    - Statistical Learning > Dimensionality Reduction (0.65)
  - Natural Language > Large Language Model (1.00)