Breaking BERT: Evaluating and Optimizing Sparsified Attention

Open in new window