Dissecting Query-Key Interaction in Vision Transformers

Neural Information Processing Systems 

Self-attention in vision transformers is often thought to perform perceptual grouping where tokens attend to other tokens with similar embeddings, which could correspond to semantically similar features of an object.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found