Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers

Neural Information Processing Systems 

Especially in the ImageNet-1k classification, DTEM achieves a 37.2% reduction in

Similar Docs  Excel Report  more

TitleSimilaritySource
None found