AITopics | linear network

Neural networks are known to develop latent representations that are $aligned$, namely structurally similar across networks trained with different architectures, training protocols, or training datasets. We study this phenomenon in a controlled setting, where we train an ensemble of networks on regression and classification tasks using training sets perturbed by independent realizations of a noise process. We show that the signal-to-noise ratio (SNR) and the training sample size influence the alignment in qualitatively similar ways in networks trained on real-world datasets and in an extremely simple $linear$ network with a single hidden layer, for which the alignment can be estimated analytically. Across linear and nonlinear networks, regression and classification tasks, and both synthetic and real-world data, we consistently observe that alignment varies monotonically with SNR but non-monotonically with training sample size. In particular, the alignment is minimized near the interpolation threshold, and a stronger alignment does not necessarily correspond to better generalization error. These findings reveal a non-trivial dependence of alignment on data quality and quantity, decoupled from generalization performance.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Machine Learning

2605.26973

Country: Europe > Italy (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

Lauditi, Clarissa, Pehlevan, Cengiz, Bordelon, Blake

arXiv.org Machine LearningMay-22-2026

We study the evolution of hidden-weight spectra in wide neural networks trained by (stochastic) gradient descent. We develop a two-level dynamical mean-field theory (DMFT) that jointly tracks bulk and outlier spectral dynamics for spiked ensembles whose spike directions remain statistically dependent on the random bulk. We apply this framework to two settings: (1) infinite-width nonlinear networks in mean-field/$μ$P scaling and (2) deep linear networks in the proportional high-dimensional limit, where width, input dimension, and sample size diverge with fixed ratios. Our theory predicts how outliers evolve with training time, width, output scale, and initialization variance. In deep linear networks, $μ$P yields width-consistent outlier dynamics and hyperparameter transfer, including width-stable growth of the leading NTK mode toward the edge of stability (EoS). In contrast, NTK parameterization exhibits strongly width-dependent outlier dynamics, despite converging to a stable large-width limit. We show that this bulk+outlier picture is descriptive of simple tasks with small output channels, but that tasks involving large numbers of outputs (ImageNet classification or GPT language modeling) are better described by a restructuring of the spectral bulk. We develop a toy model with extensive output channels that recapitulates this phenomenon and show that edge of the spectrum still converges for sufficiently wide networks.

artificial intelligence, arxiv preprint arxiv, machine learning, (17 more...)

arXiv.org Machine Learning

2605.0787

Country: North America > United States (0.46)

Genre: Research Report (0.49)

Industry:

Telecommunications > Networks (0.40)
Information Technology > Networks (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

d067d16e3e5fe8fa8a3e62909907659a-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 10:38:20 GMT

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe > Switzerland (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.67)

Add feedback

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

Neural Information Processing SystemsApr-25-2026, 17:13:18 GMT

We analyze the dynamics of finite width effects in wide but finite feature learning neural networks. Starting from a dynamical mean field theory description of infinite width deep neural network kernel and prediction dynamics, we provide a characterization of the O(1/ width) fluctuations of the DMFT order parameters over random initializations of the network weights. Our results, while perturbative in width, unlike prior analyses, are non-perturbative in the strength of feature learning. In the lazy limit of network training, all kernels are random but static in time and the prediction variance has a universal form. However, in the rich, feature learning regime, the fluctuations of the kernels and predictions are dynamically coupled with a variance that can be computed self-consistently.

artificial intelligence, machine learning, variance, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

17a9ab4190289f0e1504bbb98d1d111a-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 08:29:03 GMT

artificial intelligence, iterate, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data

Neural Information Processing SystemsApr-25-2026, 04:00:28 GMT

In this paper, we study gradient methods for training deep linear neural networks with binary cross-entropy loss. In particular, we show global directional convergence guarantees from a polynomial rate to a linear rate for (deep) linear networks with spherically symmetric data distribution, which can be viewed as a specific zero-margin dataset. Our results do not require the assumptions in other works such as small initial loss, presumed convergence of weight direction, or overparameterization. We also characterize our findings in experiments.

artificial intelligence, convergence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

24ec8468b67314c2013d215b77034476-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 04:00:15 GMT

artificial intelligence, convergence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

1baff70e2669e8376347efd3a874a341-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 23:10:40 GMT

artificial intelligence, derivation, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

1baff70e2669e8376347efd3a874a341-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 23:10:37 GMT

artificial intelligence, linear network, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Deep linear networks for regression are implicitly regularized towards flat minima

Neural Information Processing SystemsMar-21-2026, 13:15:26 GMT

The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to understand their optimization dynamics. In this paper, we study the sharpness of deep linear networks for univariate regression. Minimizers can have arbitrarily large sharpness, but not an arbitrarily small one. Indeed, we show a lower bound on the sharpness of minimizers, which grows linearly with depth. We then study the properties of the minimizer found by gradient flow, which is the limit of gradient descent with vanishing learning rate.

artificial intelligence, machine learning, proceedings, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.63)

Add feedback

Filters

Collaborating Authors

linear network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Signal-to-Noise Ratio and Sample Size Govern Representational Alignment in Neural Networks

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

d067d16e3e5fe8fa8a3e62909907659a-Paper-Conference.pdf

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

17a9ab4190289f0e1504bbb98d1d111a-Supplemental-Conference.pdf

Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data

24ec8468b67314c2013d215b77034476-Paper.pdf

1baff70e2669e8376347efd3a874a341-Supplemental.pdf

1baff70e2669e8376347efd3a874a341-Paper.pdf

Deep linear networks for regression are implicitly regularized towards flat minima