On the Surprising Effectiveness of Attention Transfer for Vision Transformers Yuandong Tian Beidi Chen Carnegie Mellon University FAIR Carnegie Mellon University Deepak Pathak

Jun-1-2025, 21:31:17 GMT–Neural Information Processing Systems

Conventional wisdom suggests that pre-training Vision Transformers (ViT) improves downstream performance by learning useful representations.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Jun-1-2025, 21:31:17 GMT

Conferences PDF

Genre:
- Research Report
  - Experimental Study (0.93)
  - New Finding (0.93)

Industry:
- Education (0.46)
- Leisure & Entertainment (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Natural Language (1.00)
  - Vision (1.00)