Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

Dec-24-2025, 15:41:45 GMT–Neural Information Processing Systems

In the proportional asymptotic limit where the number of training examples $n$ and the dimensionality $d$ jointly diverge: $n,d\to\infty, n/d\to\psi\in(0,\infty)$, we ask the following question: how large should the spike magnitude $\theta$ (i.e., the strength of the low-dimensional component) be, in order for $(i)$ kernel methods, $(ii)$ neural networks optimized by gradient descent, to learn $f_*$? We show that for kernel ridge regression, $\beta\ge 1-\frac{1}{p}$ is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, $\beta> 1-\frac{1}{k}$ suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since $k\le p$ by definition, neural networks can adapt to such structures more effectively.

boldsymbol, low-dimensional structure, spiked random matrix perspective, (11 more...)

Neural Information Processing Systems

Dec-24-2025, 15:41:45 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report > New Finding (0.58)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.94)
  - Inductive Learning (0.58)