An Empirical Study of Scaling Laws for Transfer

Aug-29-2024–arXiv.org Artificial Intelligence

In recent years, a number of papers have uncovered machine learning scaling laws--defined as empirical regularities that describe how the performance of a model increases as a function of scale, usually in parameter count and data (Hestness et al. 2017, Kaplan et al. 2020, Hoffmann et al. 2022). Hernandez et al. 2021 described scaling laws for transfer learning, showing how the transfer learning properties of models change as a function of model size. The primary result was that the degree of transfer--as measured by the amount of effective data transferred from one distribution to another--follows a simple power law in parameter count and fine-tuning data size. However, their analysis left much room for further exploration, as it only considered transfer learning from English to Python, and did not explore the relationship between the pre-training data size and the degree of downstream transfer learning. Scaling laws for transfer are important to study because they inform the degree to which progress in machine learning is bottlenecked by data for specific tasks. Consider that to achieve high performance on some tasks, one standard approach in the foundation model paradigm is to pre-train a model on a large, diverse distribution and then fine-tune it on a particular downstream task (Bommasani et al. 2022).

dataset, fine-tuning data, transfer gap, (14 more...)

arXiv.org Artificial Intelligence

Aug-29-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.93)
  - Machine Learning
    - Transfer Learning (0.96)
    - Neural Networks > Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found