Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition

Neural Information Processing Systems 

They split every 2D image into a fixed number of patches, each of which is treated as a token.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found