The Role of Pre-training Data in Transfer Learning

Entezari, Rahim, Wortsman, Mitchell, Saukh, Olga, Shariatnia, M. Moein, Sedghi, Hanie, Schmidt, Ludwig

Mar-1-2023–arXiv.org Artificial Intelligence

The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high-accuracy models. While most studies recommend scaling the pre-training size to benefit most from transfer learning, a question remains: what data and method should be used for pre-training? We investigate the impact of pre-training data distribution on the few-shot and full fine-tuning performance using 3 pre-training methods (supervised, contrastive language-image and image-image), 7 pre-training datasets, and 9 downstream datasets. Through extensive controlled experiments, we find that the choice of the pre-training data source is essential for the few-shot transfer, but its role decreases as more data is made available for fine-tuning. Additionally, we explore the role of data curation and examine the trade-offs between label noise and the size of the pre-training dataset. We find that using 2000X more pre-training data from LAION can match the performance of supervised ImageNet pre-training. Furthermore, we investigate the effect of pre-training methods, comparing language-image contrastive vs. image-image contrastive, and find that the latter leads to better downstream accuracy

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Mar-1-2023

arXiv.org PDF

Add feedback

Country:
- Africa (0.04)
- Asia > Middle East
  - Iran > Tehran Province > Tehran (0.04)
- Atlantic Ocean (0.04)
- Europe
  - Austria
    - Styria > Graz (0.04)
    - Vienna (0.04)
  - France (0.04)
  - Greece (0.04)
  - Italy > Lazio
    - Rome (0.04)
  - Netherlands (0.04)
  - Poland (0.04)
  - Portugal > Lisbon
    - Lisbon (0.14)
  - United Kingdom > England
    - Merseyside (0.04)
- North America
  - Canada > Ontario
    - Toronto (0.04)
  - Panama (0.04)
  - United States
    - Alaska (0.04)
    - Georgia > Towns County (0.04)
    - Maryland > Baltimore (0.04)
    - North Dakota (0.04)
    - Ohio > Summit County
      - Akron (0.04)
    - Oregon (0.04)
    - Wisconsin (0.04)
- Oceania > Australia
  - Australian Capital Territory > Canberra (0.04)

Genre:
- Research Report
  - Experimental Study (0.68)
  - New Finding (0.93)

Industry:
- Government > Regional Government > North America Government > United States Government (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.68)
  - Statistical Learning (0.67)
  - Transfer Learning (0.82)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found