Scaling laws for learning with real and surrogate data Ayush Jain 1 Andrea Montanari

Neural Information Processing Systems 

Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of n data points from the target distribution with data from more accessible sources, e.g.