Two Sparse Matrices are Better than One: Sparsifying Neural Networks with Double Sparse Factorization

Sep-27-2024–arXiv.org Artificial Intelligence

Neural networks are often challenging to work with due to their large size and complexity. To address this, various methods aim to reduce model size by sparsifying or decomposing weight matrices, such as magnitude pruning and low-rank or block-diagonal factorization. Although solving this problem exactly is computationally infeasible, we propose an efficient heuristic based on alternating minimization via ADMM that achieves state-of-the-art results, enabling unprecedented sparsification of neural networks. For instance, in a one-shot pruning setting, our method can reduce the size of the LLaMA2-13B model by 50% while maintaining better performance than the dense LLaMA2-7B model. We also compare favorably with Optimal Brain Compression, the state-of-the-art layer-wise pruning approach for convolutional neural networks. Furthermore, accuracy improvements of our method persist even after further model fine-tuning. Sparse neural networks have gained attention due to their potential to reduce computational costs and memory usage, making them more efficient for deployment on resource-constrained devices (LeCun et al., 1989; Han et al., 2015; Hoefler et al., 2021). By reducing the number of non-zero parameters, sparse networks can achieve accuracy similar to dense networks while requiring fewer operations.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

Sep-27-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Slovakia > Bratislava > Bratislava (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found