AITopics | interpolator

The successful training of neural networks hinges on the use of first order optimization methods, yet the theoretical characterization of these methods remains incomplete. This is especially true in settings with mild overparameterization. In this work, we study the gradient flow dynamics of two-layer ReLU networks from small initialization with orthogonal training data. We prove the limiting flow converges to a saddle-to-saddle jump process as the initialization scale tends to zero, revealing an incremental learning phenomenon in which a new neuron activates at each saddle. This analysis recovers the known result of Dana et al. (2025, arXiv:2502.16977) that the network interpolates the training data with high probability as soon as $m \gtrsim \log(n)$, where $m$ is the network width and $n$ is the number of training samples. This incremental process characterization also allows us to derive a novel implicit bias result: the learned interpolator has a squared $\ell_2$-norm scaling as $\sqrt{n}$, which is within a constant factor of the minimal $\ell_2$-norm interpolator. More broadly, our work provides the first rigorous proof of an incremental learning process for ReLU networks, whilst suggesting mildly overparameterized networks can converge to interpolating solutions whose complexity is of the same order as that of the optimal interpolator.

artificial intelligence, machine learning, neuron, (19 more...)

arXiv.org Machine Learning

2605.27097

Country: North America (0.15)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ed7b8e1312f6ba8af6e4316dcd28bb3d-Paper-Conference.pdf

Neural Information Processing SystemsApr-28-2026, 07:41:26 GMT

artificial intelligence, machine learning, regularization, (17 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)

Add feedback

b444ad72520a5f5c467343be88e352ed-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 16:32:15 GMT

artificial intelligence, equation, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Provable Tempered Overfitting of Minimal Nets and Typical Nets

Neural Information Processing SystemsFeb-15-2026, 01:52:38 GMT

For both learning rules, we prove overfitting is tempered.

artificial intelligence, machine learning, threshold network, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

ed7b8e1312f6ba8af6e4316dcd28bb3d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 17:22:34 GMT

Wefindalarge range of behavior that can be precisely characterized by a new measure ofconfounding strength.

artificial intelligence, machine learning, regularization, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability

Neural Information Processing SystemsFeb-12-2026, 10:13:19 GMT

Currently, most theoretical works on implicit regularisation have primarily focused on continuous time approximations of (S)GD where the impact of crucial hyperparameters such as the stepsize and the minibatch size are ignored. One such common simplification is to analyse gradient flow, which is a continuous time limit of GD and minibatch SGD with an infinitesimal stepsize. By definition, this analysis does not capture the effect of stepsize or stochasticity.

artificial intelligence, machine learning, stepsize, (15 more...)

Neural Information Processing Systems

Country: