AITopics | hessian eigenvector

Collaborating Authors

hessian eigenvector

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification

Hartoyo, Agus, Argasiński, Jan, Trenk, Aleksandra, Przybylska, Kinga, Błasiak, Anna, Crimi, Alessandro

arXiv.org Artificial IntelligenceFeb-14-2024

Covariance and Hessian matrices have been analyzed separately in the literature for classification problems. However, integrating these matrices has the potential to enhance their combined power in improving classification performance. We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model to achieve optimal class separability in binary classification tasks. Our approach is substantiated by formal proofs that establish its capability to maximize between-class mean distance and minimize within-class variances. By projecting data into the combined space of the most relevant eigendirections from both matrices, we achieve optimal class separability as per the linear discriminant analysis (LDA) criteria. Empirical validation across neural and health datasets consistently supports our theoretical framework and demonstrates that our method outperforms established methods. Our method stands out by addressing both LDA criteria, unlike PCA and the Hessian method, which predominantly emphasize one criterion each. This comprehensive approach captures intricate patterns and relationships, enhancing classification performance. Furthermore, through the utilization of both LDA criteria, our method outperforms LDA itself by leveraging higher-dimensional feature spaces, in accordance with Cover's theorem, which favors linear separability in higher dimensions. Our method also surpasses kernel-based methods and manifold learning techniques in performance. Additionally, our approach sheds light on complex DNN decision-making, rendering them comprehensible within a 2D space.

eigenanalysis, matrix, variance, (13 more...)

arXiv.org Artificial Intelligence

2402.09281

Country:

North America > United States > Wisconsin (0.04)
Europe > Poland > Lesser Poland Province > Kraków (0.04)
Asia > Indonesia > Java > West Java > Bandung (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Hessian Eigenvectors and Principal Component Analysis of Neural Network Weight Matrices

Haink, David

arXiv.org Artificial IntelligenceNov-1-2023

This study delves into the intricate dynamics of trained deep neural networks and their relationships with network parameters. Trained networks predominantly continue training in a single direction, known as the drift mode. This drift mode can be explained by the quadratic potential model of the loss function, suggesting a slow exponential decay towards the potential minima. We unveil a correlation between Hessian eigenvectors and network weights. This relationship, hinging on the magnitude of eigenvalues, allows us to discern parameter directions within the network. Notably, the significance of these directions relies on two defining attributes: the curvature of their potential wells (indicated by the magnitude of Hessian eigenvalues) and their alignment with the weight vectors. Our exploration extends to the decomposition of weight matrices through singular value decomposition. This approach proves practical in identifying critical directions within the Hessian, considering both their magnitude and curvature. Furthermore, our examination showcases the applicability of principal component analysis in approximating the Hessian, with update parameters emerging as a superior choice over weights for this purpose. Remarkably, our findings unveil a similarity between the largest Hessian eigenvalues of individual layers and the entire network. Notably, higher eigenvalues are concentrated more in deeper layers. Leveraging these insights, we venture into addressing catastrophic forgetting, a challenge of neural networks when learning new tasks while retaining knowledge from previous ones. By applying our discoveries, we formulate an effective strategy to mitigate catastrophic forgetting, offering a possible solution that can be applied to networks of varying scales, including larger architectures.

eigenvector and principal component analysis, hessian eigenvector, neural network weight matrix

arXiv.org Artificial Intelligence

2311.00452

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.60)

Add feedback

Unveiling the Hessian's Connection to the Decision Boundary

Sabanayagam, Mahalakshmi, Behrens, Freya, Adomaityte, Urte, Dawid, Anna

arXiv.org Artificial IntelligenceJun-12-2023

Understanding the properties of well-generalizing minima is at the heart of deep learning research. On the one hand, the generalization of neural networks has been connected to the decision boundary complexity, which is hard to study in the high-dimensional input space. Conversely, the flatness of a minimum has become a controversial proxy for generalization. In this work, we provide the missing link between the two approaches and show that the Hessian top eigenvectors characterize the decision boundary learned by the neural network. Notably, the number of outliers in the Hessian spectrum is proportional to the complexity of the decision boundary. Based on this finding, we provide a new and straightforward approach to studying the complexity of a high-dimensional decision boundary; show that this connection naturally inspires a new generalization measure; and finally, we develop a novel margin estimation technique which, in combination with the generalization measure, precisely identifies minima with simple wide-margin boundaries. Overall, this analysis establishes the connection between the Hessian and the decision boundary and provides a new method to identify minima with simple wide-margin decision boundaries.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.07104

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances

Kühn, Marcel, Rosenow, Bernd

arXiv.org Artificial IntelligenceJun-8-2023

Stochastic gradient descent (SGD) has become a cornerstone of neural network optimization, yet the noise introduced by SGD is often assumed to be uncorrelated over time, despite the ubiquity of epoch-based training. In this work, we challenge this assumption and investigate the effects of epoch-based noise correlations on the stationary distribution of discrete-time SGD with momentum, limited to a quadratic loss. Our main contributions are twofold: first, we calculate the exact autocorrelation of the noise for training in epochs under the assumption that the noise is independent of small fluctuations in the weight vector; second, we explore the influence of correlations introduced by the epoch-based learning scheme on SGD dynamics. We find that for directions with a curvature greater than a hyperparameter-dependent crossover value, the results for uncorrelated noise are recovered. However, for relatively flat directions, the weight variance is significantly reduced. We provide an intuitive explanation for these results based on a crossover between correlation times, contributing to a deeper understanding of the dynamics of SGD in the presence of epoch-based noise correlations.

artificial intelligence, eigenvalue, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.053

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Saxony > Leipzig (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Q-SHED: Distributed Optimization at the Edge via Hessian Eigenvectors Quantization

Fabbro, Nicolò Dal, Rossi, Michele, Schenato, Luca, Dey, Subhrakanti

arXiv.org Artificial IntelligenceMay-18-2023

Edge networks call for communication efficient (low overhead) and robust distributed optimization (DO) algorithms. These are, in fact, desirable qualities for DO frameworks, such as federated edge learning techniques, in the presence of data and system heterogeneity, and in scenarios where internode communication is the main bottleneck. Although computationally demanding, Newton-type (NT) methods have been recently advocated as enablers of robust convergence rates in challenging DO problems where edge devices have sufficient computational power. Along these lines, in this work we propose Q-SHED, an original NT algorithm for DO featuring a novel bit-allocation scheme based on incremental Hessian eigenvectors quantization. The proposed technique is integrated with the recent SHED algorithm, from which it inherits appealing features like the small number of required Hessian computations, while being bandwidth-versatile at a bit-resolution level. Our empirical evaluation against competing approaches shows that Q-SHED can reduce by up to 60% the number of communication rounds required for convergence.

artificial intelligence, eigenvector, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.10852

Country:

Europe > Sweden > Uppsala County > Uppsala (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (0.82)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback