AITopics | sharpness measure

Appendix

Neural Information Processing SystemsApr-29-2026, 23:47:55 GMT

This appendix is structured as follows: In Appendix A we provide more training details. In particular, we report the hyperparameters used for the CIFAR experiments in A.1 and for the ImageNet experiments in A.2. In A.3 we provide more details and a formal definition of the SAM-variants used throughout this paper. In Appendix B we show additional experimental results for: CIFAR in B.1, ImageNet in B.3, and a machine translation task in B.5. In B.2 we provide additional ablation studies for sparse perturbation SSAM approaches and in B.4 we extend the discussion on adversarial robustness.

artificial intelligence, machine learning, sam-on, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

da909fc3893d272f26fd9db82e09d954-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 11:20:50 GMT

artificial intelligence, machine learning, sam-on, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

b2ba568effcc3ab221912db2fb095ea9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 10:56:07 GMT

generalization performance, neural network, sharpness measure, (11 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

948b1c9d660d7286dd767cd07dabd487-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 20:47:16 GMT

dataset, international conference, trajectory loss, (14 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.14)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Sharpness-Aware Training for Free

Neural Information Processing SystemsDec-24-2025, 20:07:43 GMT

Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. However, SAM-like methods incur a two-fold computational overhead of the given base optimizer (e.g.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization

Zhang, Qiaozhe, Sun, Jun, Zhang, Ruijie, Liu, Yingzhuang

arXiv.org Artificial IntelligenceOct-17-2025

Sharpness (of the loss minima) is a common measure to investigate the generalization of neural networks. Intuitively speaking, the flatter the landscape near the minima is, the better generalization might be. Unfortunately, the correlation between many existing sharpness measures and the generalization is usually not strong, sometimes even weak. To close the gap between the intuition and the reality, we propose a novel sharpness measure, i.e., \textit{Rényi sharpness}, which is defined as the negative Rényi entropy (a generalization of the classical Shannon entropy) of the loss Hessian. The main ideas are as follows: 1) we realize that \textit{uniform} (identical) eigenvalues of the loss Hessian is most desirable (while keeping the sum constant) to achieve good generalization; 2) we employ the \textit{Rényi entropy} to concisely characterize the extent of the spread of the eigenvalues of loss Hessian. Normally, the larger the spread, the smaller the (Rényi) entropy. To rigorously establish the relationship between generalization and (Rényi) sharpness, we provide several generalization bounds in terms of Rényi sharpness, by taking advantage of the reparametrization invariance property of Rényi sharpness, as well as the trick of translating the data discrepancy to the weight perturbation. Furthermore, extensive experiments are conducted to verify the strong correlation (in specific, Kendall rank correlation) between the Rényi sharpness and generalization. Moreover, we propose to use a variant of Rényi Sharpness as regularizer during training, i.e., Rényi Sharpness Aware Minimization (RSAM), which turns out to outperform all existing sharpness-aware minimization methods. It is worthy noting that the test accuracy gain of our proposed RSAM method could be as high as nearly 2.5\%, compared against the classical SAM method.

artificial intelligence, generalization, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2510.07758

Country: North America (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

b2ba568effcc3ab221912db2fb095ea9-Paper-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 22:10:33 GMT

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

948b1c9d660d7286dd767cd07dabd487-Paper-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 02:09:35 GMT

artificial intelligence, international conference, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.14)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It

da Silva, Marvin F., Dangel, Felix, Oore, Sageev

arXiv.org Artificial IntelligenceMay-9-2025

The concept of sharpness has been successfully applied to traditional architectures like MLPs and CNNs to predict their generalization. For transformers, however, recent work reported weak correlation between flatness and generalization. We argue that existing sharpness measures fail for transformers, because they have much richer symmetries in their attention mechanism that induce directions in parameter space along which the network or its loss remain identical. We posit that sharpness must account fully for these symmetries, and thus we redefine it on a quotient manifold that results from quotienting out the transformer symmetries, thereby removing their ambiguities. Leveraging tools from Riemannian geometry, we propose a fully general notion of sharpness, in terms of a geodesic ball on the symmetry-corrected quotient manifold. In practice, we need to resort to approximating the geodesics. Doing so up to first order yields existing adaptive sharpness measures, and we demonstrate that including higher-order terms is crucial to recover correlation with generalization. We present results on diagonal networks with synthetic data, and show that our geodesic sharpness reveals strong correlation for real-world transformers on both text and image classification tasks.

artificial intelligence, machine learning, sharpness, (12 more...)

arXiv.org Artificial Intelligence

2505.05409

Country:

North America > Canada (0.67)
North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Sharpness-Aware Training for Free

Neural Information Processing SystemsJan-17-2025, 23:00:34 GMT

Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. However, SAM-like methods incur a two-fold computational overhead of the given base optimizer (e.g. In this paper, we propose Sharpness-Aware Training for Free, or SAF, which mitigates the sharp landscape at almost zero additional computational cost over the base optimizer.

base optimizer, sharpness measure, sharpness-aware training, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)

Add feedback

Filters

Collaborating Authors

sharpness measure

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Appendix

da909fc3893d272f26fd9db82e09d954-Supplemental-Conference.pdf

b2ba568effcc3ab221912db2fb095ea9-Paper-Conference.pdf

948b1c9d660d7286dd767cd07dabd487-Paper-Conference.pdf

Sharpness-Aware Training for Free

Rényi Sharpness: A Novel Sharpness that Strongly Correlates with Generalization

b2ba568effcc3ab221912db2fb095ea9-Paper-Conference.pdf

948b1c9d660d7286dd767cd07dabd487-Paper-Conference.pdf

Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It

Sharpness-Aware Training for Free