AITopics | eigenspectrum

222a2a46018a1e7b55ba48ba11932d04-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 01:53:05 GMT

artificial intelligence, machine learning, training performance, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point

Couto, Carlos, Mourão, José, Figueiredo, Mário A. T., Ribeiro, Pedro

arXiv.org Machine LearningDec-18-2025

Near an optimal learning point of a neural network, the learning performance of gradient descent dynamics is dictated by the Hessian matrix of the loss function with respect to the network parameters. We characterize the Hessian eigenspectrum for some classes of teacher-student problems, when the teacher and student networks have matching weights, showing that the smaller eigenvalues of the Hessian determine long-time learning performance. For linear networks, we analytically establish that for large networks the spectrum asymptotically follows a convolution of a scaled chi-square distribution with a scaled Marchenko-Pastur distribution. We numerically analyse the Hessian spectrum for polynomial and other non-linear networks. Furthermore, we show that the rank of the Hessian matrix can be seen as an effective number of parameters for networks using polynomial activation functions. For a generic non-linear activation function, such as the error function, we empirically observe that the Hessian matrix is always full rank.

eigenvalue, matrix, teacher-student perspective, (14 more...)

arXiv.org Machine Learning

2512.15606

Country:

Europe > Portugal > Lisbon > Lisbon (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)

Genre: Research Report (0.64)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models

Mao, Jialin, Griniasty, Itay, Sun, Yan, Transtrum, Mark K., Sethna, James P., Chaudhari, Pratik

arXiv.org Artificial IntelligenceNov-19-2025

Recent experiments have shown that training trajectories of multiple deep neural networks with different architectures, optimization algorithms, hyper-parameter settings, and regularization methods evolve on a remarkably low-dimensional "hyper-ribbon-like" manifold in the space of probability distributions. Inspired by the similarities in the training trajectories of deep networks and linear networks, we analytically characterize this phenomenon for the latter. We show, using tools in dynamical systems theory, that the geometry of this low-dimensional manifold is controlled by (i) the decay rate of the eigenvalues of the input correlation matrix of the training data, (ii) the relative scale of the ground-truth output to the weights at the beginning of training, and (iii) the number of steps of gradient descent. By analytically computing and bounding the contributions of these quantities, we characterize phase boundaries of the region where hyper-ribbons are to be expected. We also extend our analysis to kernel machines and linear models that are trained with stochastic gradient descent.

artificial intelligence, eigenvalue, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.08915

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Add feedback

A Additional Figures for Section 4.1

Neural Information Processing SystemsOct-2-2025, 19:22:24 GMT

Star indicates the dimension at which the cumulative variance exceeds 90%. The shaded grey area are the eigenvalues that are not regularized. Eigenspectrum of the first hidden layer. B) Eigenspectrum of the second hidden layer. The shaded gray area are the eigenvalues that are not regularized.

artificial intelligence, eigenspectrum, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

We asked (a) if having a 1/n neural code make neural networks more robust, and (b) how does the neural code

Neural Information Processing SystemsOct-2-2025, 19:22:06 GMT

We thank the reviewers for their insightful comments and suggestions. As pointed out by R1,R2 & R3, our experiments were only run on MNIST. We would like to draw the attention of R5 to this particular case. " We apologize for the confusion. To clarify, the whitening employed in section 4.2 is used to investigate the BN was only used for the shallow neural networks in section 4.1 as we found that " According to the theory developed by Stringer et al., having

artificial intelligence, eigenspectrum, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

Add feedback

Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy

Ghosh, Arna, Chorghay, Zahraa, Bakhtiari, Shahab, Richards, Blake A.

arXiv.org Artificial IntelligenceSep-18-2025

Biological and artificial intelligence systems navigate the fundamental efficiency-robustness tradeoff for optimal encoding, i.e., they must efficiently encode numerous attributes of the input space while also being robust to noise. This challenge is particularly evident in hierarchical processing systems like the human brain. With a view towards understanding how systems navigate the efficiency-robustness tradeoff, we turned to a population geometry framework for analyzing representations in the human visual cortex alongside artificial neural networks (ANNs). In the ventral visual stream, we found general-purpose, scale-free representations characterized by a power law-decaying eigenspectrum in most areas. However, in certain higher-order visual areas did not have scale-free representations, indicating that scale-free geometry is not a universal property of the brain. In parallel, ANNs trained with a self-supervised learning objective also exhibited free-free geometry, but not after fine-tune on a specific task. Based on these empirical results and our analytical insights, we posit that a system's representation geometry is not a universal property and instead depends upon the computational objective.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2509.13459

Country: North America > Canada > Quebec > Montreal (0.15)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: