Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data

Open in new window