TokenMixup: EfficientAttention-guided Token-levelDataAugmentationforTransformers

Neural Information Processing Systems 

Its objective is to mix intermediate token sets so that the saliency level of a batch is maximized. To detect salient tokens,wetakeadvantage oftheattention maptoachievea 15speed-up compared to the gradient-based saliency estimator, without compromising model performance.