On the Surprising Effectiveness of Attention Transfer for Vision Transformers
–Neural Information Processing Systems
Conventional wisdom suggests that pre-training Vision Transformers (ViT) improves downstream performance by learning useful representations.
Neural Information Processing Systems
Mar-22-2026, 13:56:22 GMT
- Technology: