Gradient-based Intra-attention Pruning on Pre-trained Language Models

Open in new window