Revisiting Knowledge Distillation for Autoregressive Language Models

Open in new window