The Role of Pre-training Data in Transfer Learning
Entezari, Rahim, Wortsman, Mitchell, Saukh, Olga, Shariatnia, M. Moein, Sedghi, Hanie, Schmidt, Ludwig
–arXiv.org Artificial Intelligence
The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high-accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets. Through extensive controlled experiments, we find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning. Additionally, we explore the role of data curation and examine the trade-offs between label noise and the size of the pre-training dataset. We find that using 2000X more pre-training data from LAION can match the performance of supervised ImageNet pre-training. Furthermore, we investigate the effect of pre-training methods, comparing language-image contrastive vs. image-image contrastive, and find that the latter leads to better downstream accuracy
arXiv.org Artificial Intelligence
Mar-1-2023
- Country:
- Africa (0.04)
- Asia > Middle East
- Iran > Tehran Province > Tehran (0.04)
- Atlantic Ocean (0.04)
- Europe
- North America
- Canada > Ontario
- Toronto (0.04)
- Panama (0.04)
- United States
- Alaska (0.04)
- Georgia > Towns County (0.04)
- Maryland > Baltimore (0.04)
- North Dakota (0.04)
- Ohio > Summit County
- Akron (0.04)
- Oregon (0.04)
- Wisconsin (0.04)
- Canada > Ontario
- Oceania > Australia
- Australian Capital Territory > Canberra (0.04)
- Genre:
- Research Report
- Experimental Study (0.68)
- New Finding (0.93)
- Research Report
- Industry:
- Technology: