On the Surprising Effectiveness of Attention Transfer for Vision Transformers Yuandong Tian Beidi Chen Carnegie Mellon University FAIR Carnegie Mellon University Deepak Pathak
–Neural Information Processing Systems
Conventional wisdom suggests that pre-training Vision Transformers (ViT) improves downstream performance by learning useful representations.
Neural Information Processing Systems
Jun-1-2025, 21:31:17 GMT
- Genre:
- Research Report
- Experimental Study (0.93)
- New Finding (0.93)
- Research Report
- Industry:
- Education (0.46)
- Leisure & Entertainment (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (1.00)
- Natural Language (1.00)
- Vision (1.00)
- Information Technology > Artificial Intelligence