Goto

Collaborating Authors

 pruning


Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?

Neural Information Processing Systems

PaI methods manage to find trainable subnetworks that outperform random pruning, their performance in terms of both accuracy and computational reduction is far from satisfactory compared to post-training pruning and the understanding of PaI is missing.


Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?

Neural Information Processing Systems

PaI methods manage to find trainable subnetworks that outperform random pruning, their performance in terms of both accuracy and computational reduction is far from satisfactory compared to post-training pruning and the understanding of PaI is missing.



Dynamic Sparsity Is Channel-Level Sparsity Learner Lu Yin 1, Gen Li

Neural Information Processing Systems

Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for the entire training process as well as inference.



Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Neural Information Processing Systems

Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational




Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort Qualcomm AI Research

Neural Information Processing Systems

In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks.


How a student becomes a teacher: learning and forgetting through Spectral methods

Neural Information Processing Systems

The above scheme proves particularly relevant when the student network is overparameterized (namely, when larger layer sizes are employed) as compared to the underlying teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network.