Breaking the curse of dimensionality for linear rules: optimal predictors over the ellipsoid

Sep-26-2025–arXiv.org Machine Learning

In this work, we address the following question: What minimal structural assumptions are needed to prevent the degradation of statistical learning bounds with increasing dimensionality? We investigate this question in the classical statistical setting of signal estimation from $n$ independent linear observations $Y_i = X_i^{\top}θ+ ε_i$. Our focus is on the generalization properties of a broad family of predictors that can be expressed as linear combinations of the training labels, $f(X) = \sum_{i=1}^{n} l_{i}(X) Y_i$. This class -- commonly referred to as linear prediction rules -- encompasses a wide range of popular parametric and non-parametric estimators, including ridge regression, gradient descent, and kernel methods. Our contributions are twofold. First, we derive non-asymptotic upper and lower bounds on the generalization error for this class under the assumption that the Bayes predictor $θ$ lies in an ellipsoid. Second, we establish a lower bound for the subclass of rotationally invariant linear prediction rules when the Bayes predictor is fixed. Our analysis highlights two fundamental contributions to the risk: (a) a variance-like term that captures the intrinsic dimensionality of the data; (b) the noiseless error, a term that arises specifically in the high-dimensional regime. These findings shed light on the role of structural assumptions in mitigating the curse of dimensionality.

artificial intelligence, machine learning, theorem 4, (16 more...)

arXiv.org Machine Learning

Sep-26-2025

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.14)
- North America > United States
  - New York (0.04)

Genre:
- Research Report (0.64)

Industry:
- Government > Regional Government (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.48)
  - Performance Analysis > Accuracy (0.35)