A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models

Open in new window