Goto

Collaborating Authors

 linear layer


Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration

Pourkamali-Anaraki, Farhad

arXiv.org Machine Learning

The massive scale of pretrained models has made efficient compression essential for practical deployment. Low-rank decomposition based on the singular value decomposition (SVD) provides a principled approach for model reduction, but its exact computation is expensive for large weight matrices. Randomized alternatives such as randomized SVD (RSVD) improve efficiency, yet they can suffer from poor approximation quality when the singular value spectrum decays slowly, a regime commonly observed in modern pretrained models. In this work, we address this limitation from both theoretical and empirical perspectives. First, we establish a connection between low-rank approximation error and predictive performance by analyzing softmax perturbations, showing that deviations in class probabilities are controlled by the spectral error of the compressed weights. Second, we demonstrate that RSVD is inadequate, and we propose randomized subspace iteration (RSI) as a more effective alternative. By incorporating multiple power iterations, RSI improves spectral separation and provides a controllable mechanism for enhancing approximation quality. We evaluate our approach on both convolutional networks and transformer-based architectures. Our results show that RSI achieves near-optimal approximation quality while outperforming RSVD in predictive accuracy under aggressive compression, enabling efficient model compression.








A Appendix

Neural Information Processing Systems

In the following subsections, we provide theoretical derivations. In this subsection, we provide a formal description of the consistency property of score matching. Assumption A.4. (Compactness) The parameter space is compact. Assumption A.5. (Identifiability) There exists a set of parameters A.3 are the conditions that ensure A.7 lead to the uniform convergence property [ In the following Lemma A.9 and Proposition A.10, we examine the sufficient condition for We show that the sufficient conditions stated in Lemma A.9 can be satisfied using the Figure A1: An illustration of the relationship between the variables discussed in Proposition 4.1, Lemma A.12, and Lemma A.13. The properties of KL divergence and Fisher divergence presented in the last two rows are derived in Lemmas A.12 In this section, we provide formal derivations for Proposition 4.1, Lemma A.12, and Lemma A.13. Based on Remark A.14, the following holds: D In this section, we elaborate on the experimental setups and provide the detailed configurations for the experiments presented in Section 5 of the main manuscript.