A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

Jun-12-2026, 02:44:03 GMT–Neural Information Processing Systems

Training high-performing Small Language Models (SLMs) remains computationally expensive, even with knowledge distillation and pruning from larger teacher models. Existing approaches often face three key challenges: (1) information loss from hard pruning, (2) inefficient alignment of representations, and (3) underutilization of informative activations, particularly from Feed-Forward Networks (FFNs). To address these challenges, we introduce \textbf{Low-Rank Clone (LRC)}, an efficient pre-training method that constructs SLMs aspiring to behavioral equivalence with strong teacher models. LRC trains a set of low-rank projection matrices that jointly enable soft pruning by compressing teacher weights, and activation clone by aligning student activations, including FFN signals, with those of the teacher.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Jun-12-2026, 02:44:03 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)