Goto

Collaborating Authors

 sparsity


Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?

Neural Information Processing Systems

PaI methods manage to find trainable subnetworks that outperform random pruning, their performance in terms of both accuracy and computational reduction is far from satisfactory compared to post-training pruning and the understanding of PaI is missing.


Towards Data-Agnostic Pruning At Initialization: What Makes a Good Sparse Mask?

Neural Information Processing Systems

PaI methods manage to find trainable subnetworks that outperform random pruning, their performance in terms of both accuracy and computational reduction is far from satisfactory compared to post-training pruning and the understanding of PaI is missing.



Dynamic Sparsity Is Channel-Level Sparsity Learner Lu Yin 1, Gen Li

Neural Information Processing Systems

Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for the entire training process as well as inference.



Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

Neural Information Processing Systems

Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the model's expressiveness, resulting in reduced memory and computational