A 2D Semantic-Aware Position Encoding for Vision Transformers

Open in new window