Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models

Open in new window