Scaling Laws in Linear Regression: Compute, Parameters, and Data

May-29-2025, 23:07:11 GMT–Neural Information Processing Systems

Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, which predict that increasing model size monotonically improves performance. We study the theory of scaling laws in an infinite dimensional linear regression setup. Specifically, we consider a model with M parameters as a linear function of sketched covariates. The model is trained by one-pass stochastic gradient descent (SGD) using N data.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

May-29-2025, 23:07:11 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.92)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (1.00)
  - Statistical Learning (1.00)