ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models

Neural Information Processing Systems 

One-shot pruning techniques offer a way to alleviate these burdens by removing redundant weights without the need for retraining. Y et, the massive scale of LLMs often forces current pruning approaches to rely on heuristics instead of optimization-based techniques, potentially resulting in suboptimal compression.