Less is More: Task-aware Layer-wise Distillation for Language Model Compression

Open in new window