An Empirical Study of Scaling Laws for Transfer

Barnett, Matthew

arXiv.org Artificial Intelligence 

In recent years, a number of papers have uncovered machine learning scaling laws--defined as empirical regularities that describe how the performance of a model increases as a function of scale, usually in parameter count and data (Hestness et al. 2017, Kaplan et al. 2020, Hoffmann et al. 2022). Hernandez et al. 2021 described scaling laws for transfer learning, showing how the transfer learning properties of models change as a function of model size. The primary result was that the degree of transfer--as measured by the amount of effective data transferred from one distribution to another--follows a simple power law in parameter count and fine-tuning data size. However, their analysis left much room for further exploration, as it only considered transfer learning from English to Python, and did not explore the relationship between the pre-training data size and the degree of downstream transfer learning. Scaling laws for transfer are important to study because they inform the degree to which progress in machine learning is bottlenecked by data for specific tasks. Consider that to achieve high performance on some tasks, one standard approach in the foundation model paradigm is to pre-train a model on a large, diverse distribution and then fine-tune it on a particular downstream task (Bommasani et al. 2022).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found