Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

May-26-2025, 19:47:54 GMT–Neural Information Processing Systems

In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n,d\to\infty, n/d\to\psi\in(0,\infty), we ask the following question: how large should the spike magnitude \theta (i.e., the strength of the low-dimensional component) be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f_*? We show that for kernel ridge regression, \beta\ge 1-\frac{1}{p} is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, \beta 1-\frac{1}{k} suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k\le p by definition, neural networks can adapt to such structures more effectively.

artificial intelligence, inductive learning, machine learning, (9 more...)

Neural Information Processing Systems

May-26-2025, 19:47:54 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report > New Finding (0.61)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.98)
  - Inductive Learning (0.61)