Revisiting Token Dropping Strategy in Efficient BERT Pretraining

Open in new window