Scaling laws for learning with real and surrogate data

May-27-2025, 16:06:46 GMT–Neural Information Processing Systems

Collecting large quantities of high-quality data can be prohibitively expensive or impractical, and a bottleneck in machine learning. One may instead augment a small set of n data points from the target distribution with data from more accessible sources, e.g. We refer to such data as surrogate data'. We study a weighted empirical risk minimization (ERM) approach for integrating surrogate data into training. We analyze mathematically this method under several classical statistical models, and validate our findings empirically on datasets from different domains.

scaling law, surrogate data, test error

Neural Information Processing Systems

May-27-2025, 16:06:46 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.43)