Optimal scaling laws in learning hierarchical multi-index models

Defilippis, Leonardo, Krzakala, Florent, Loureiro, Bruno, Maillard, Antoine

Feb-6-2026–arXiv.org Machine Learning

In this work, we provide a sharp theory of scaling laws for two-layer neural networks trained on a class of hierarchical multi-index targets, in a genuinely representation-limited regime. We derive exact information-theoretic scaling laws for subspace recovery and prediction error, revealing how the hierarchical features of the target are sequentially learned through a cascade of phase transitions. We further show that these optimal rates are achieved by a simple, target-agnostic spectral estimator, which can be interpreted as the small learning-rate limit of gradient descent on the first-layer weights. Once an adapted representation is identified, the readout can be learned statistically optimally, using an efficient procedure. As a consequence, we provide a unified and rigorous explanation of scaling laws, plateau phenomena, and spectral structure in shallow neural networks trained on such hierarchical targets.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

Feb-6-2026

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)
- Europe
  - France (0.14)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Switzerland > Vaud
    - Lausanne (0.04)
- Africa > Middle East
  - Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found