Compact Language Models via Pruning and Knowledge Distillation

Neural Information Processing Systems 

Large language models (LLMs) targeting different deployment scales and sizes are currently produced by training each variant from scratch; this is extremely compute-intensive. In this paper, we investigate if pruning an existing LLM and then re-training it with a fraction (<3%) of the original training data can be a suitable alternative to repeated, full retraining.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found