Dynamic Grained Encoder for Vision Transformers Lin Song

Neural Information Processing Systems 

Transformers, the de-facto standard for language modeling, have been recently applied for vision tasks. This paper introduces sparse queries for vision transformers to exploit the intrinsic spatial redundancy of natural images and save computational costs.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found