AITopics | kernel regime

Wide neural networks in the feature-learning regime drive modern deep learning, and yet they remain far less studied than their kernel-regime counterparts. We consider a critical yet under-explored difference between these two regimes: the regulariser and prior implied by gradient flow training. This canonical regularisation property is well-studied in kernel regime networks -- of all the infinite global minima, gradient flow selects exactly the vanishing ridge solution -- and underpins the celebrated NN-GP correspondence, precisely allowing the modelling of noise during training. However, we prove ridge regularisation biases gradient flow in feature-learning regime networks, even in the infinitesimal limit of vanishing regularisation. Over training, ridge distorts the inductive bias of the network, with a particular damage done to pretrained networks where the implicit prior is informative. We resolve this by axiomatising the canonical regulariser as a regime-agnostic function-space energy and lift, which uniquely identifies ridge in the kernel regime, and crucially generalises to the feature-learning regime. By studying the Riemannian geometry of feature-learning networks, we derive geodesic ridge from our framework, generalising ridge to the feature-learning regime. Correspondingly, we prove the canonical function-space prior is a Riemannian Gibbs Process, generalising the more familiar Gaussian Process. As a practical contribution, we propose arc ridge as a minimax-robust, scalable surrogate to geodesic ridge, revealing a deep relationship between early stopping and canonical regularisation across learning regimes. Finally, we demonstrate the consequences of our theory empirically on both image processing and NLP transfer-learning problems.

artificial intelligence, machine learning, mflow, (19 more...)

arXiv.org Machine Learning

2605.1818

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

fc2022c89b61c76bbef978f1370660bf-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 05:43:10 GMT

exp, initialization, regime, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

fc2022c89b61c76bbef978f1370660bf-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 05:43:03 GMT

implicit bias, regime, trajectory, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada (0.04)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

Neural Information Processing SystemsDec-25-2025, 05:21:55 GMT

We provide quantitative bounds measuring the $L^2$ difference in function space between the trajectory of a finite-width network trained on finitely many samples from the idealized kernel dynamics of infinite width and infinite data. An implication of the bounds is that the network is biased to learn the top eigenfunctions of the Neural Tangent Kernel not just on the training set but over the entire input space. This bias depends on the model architecture and input distribution alone and thus does not depend on the target function which does not need to be in the RKHS of the kernel. The result is valid for deep architectures with fully connected, convolutional, and residual layers. Furthermore the width does not need to grow polynomially with the number of samples in order to obtain high probability bounds up to a stopping time. The proof exploits the low-effective-rank property of the Fisher Information Matrix at initialization, which implies a low effective dimension of the model (far smaller than the number of parameters). We conclude that local capacity control from the low effective rank of the Fisher Information Matrix is still underexplored theoretically.

deep network, spectral bias, training set, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

will incorporate all comments in the next revision

Neural Information Processing SystemsOct-2-2025, 08:27:34 GMT

We distinguish two extremal regimes in terms of generalization behavior: "adaptive" and

artificial intelligence, machine learning, regime, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

fc2022c89b61c76bbef978f1370660bf-Supplemental.pdf

Neural Information Processing SystemsAug-17-2025, 09:55:37 GMT

artificial intelligence, exp, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

fc2022c89b61c76bbef978f1370660bf-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 09:47:38 GMT

artificial intelligence, implicit bias, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

fc2022c89b61c76bbef978f1370660bf-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 09:47:25 GMT

artificial intelligence, machine learning, training accuracy, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

The Spectral Bias of Shallow Neural Network Learning is Shaped by the Choice of Non-linearity

Sahs, Justin, Pyle, Ryan, Anselmi, Fabio, Patel, Ankit

arXiv.org Artificial IntelligenceMar-13-2025

Despite classical statistical theory predicting severe overfitting, modern massively overparameterized neural networks still generalize well. This unexpected property is attributed to the network's so-called implicit bias, which describes its propensity to converge to solutions that generalize effectively, among the many possible that correctly label the training data. The aim of our research is to explore this bias from a new perspective, focusing on how non-linear activation functions contribute to shaping it. First, we introduce a reparameterization which removes a continuous weight rescaling symmetry. Second, in the kernel regime, we leverage this reparameterization to generalize recent findings that relate shallow Neural Networks to the Radon transform, deriving an explicit formula for the implicit bias induced by a broad class of activation functions. Specifically, by utilizing the connection between the Radon transform and the Fourier transform, we interpret the kernel regime's inductive bias as minimizing a spectral seminorm that penalizes high-frequency components, in a manner dependent on the activation function. Finally, in the adaptive regime, we demonstrate the existence of local dynamical attractors that facilitate the formation of clusters of hyperplanes where the input to a neuron's activation function is zero, yielding alignment between many neurons' response functions. We confirm these theoretical results with simulations. All together, our work provides a deeper understanding of the mechanisms underlying the generalization capabilities of overparameterized neural networks and its relation with the implicit bias, offering potential pathways for designing more efficient and robust models.

activation function, equation, fourier transform, (16 more...)

arXiv.org Artificial Intelligence

2503.10587

Country:

North America > United States > Texas > Harris County > Houston (0.04)
Oceania > Australia (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

Neural Information Processing SystemsJan-18-2025, 19:40:58 GMT

We provide quantitative bounds measuring the L 2 difference in function space between the trajectory of a finite-width network trained on finitely many samples from the idealized kernel dynamics of infinite width and infinite data. An implication of the bounds is that the network is biased to learn the top eigenfunctions of the Neural Tangent Kernel not just on the training set but over the entire input space. This bias depends on the model architecture and input distribution alone and thus does not depend on the target function which does not need to be in the RKHS of the kernel. The result is valid for deep architectures with fully connected, convolutional, and residual layers. Furthermore the width does not need to grow polynomially with the number of samples in order to obtain high probability bounds up to a stopping time.

deep network, kernel regime, spectral bias, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

kernel regime

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Canonical Regularisation of Wide Feature-Learning Neural Networks

fc2022c89b61c76bbef978f1370660bf-Supplemental.pdf

fc2022c89b61c76bbef978f1370660bf-Paper.pdf

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

will incorporate all comments in the next revision

fc2022c89b61c76bbef978f1370660bf-Supplemental.pdf

fc2022c89b61c76bbef978f1370660bf-Paper.pdf

fc2022c89b61c76bbef978f1370660bf-AuthorFeedback.pdf

The Spectral Bias of Shallow Neural Network Learning is Shaped by the Choice of Non-linearity

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime