Distilling Transformers into Simple Neural Networks with Unlabeled Transfer Data