Long-ShortTransformer: EfficientTransformers forLanguageandVision

Neural Information Processing Systems 

Transformer-based models [1] have achieved great success in the domains of natural language processing (NLP) [2,3] and computer vision [4-6].