Compact Language Models via Pruning and Knowledge Distillation
–Neural Information Processing Systems
Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-training it with a fraction <3% of the original training data can be a suitable alternative to repeated, full retraining.
Neural Information Processing Systems
Feb-6-2026, 06:41:42 GMT