Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

Jan-18-2025, 21:02:32 GMT–Neural Information Processing Systems

Deep neural networks often suffer from poor generalization caused by complex and non-convex loss landscapes. One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized change of training loss when adding a perturbation to the weight. However, we find the indiscriminate perturbation of SAM on all parameters is suboptimal, which also results in excessive computation, \emph{i.e.}, double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose an efficient and effective training scheme coined as Sparse SAM (SSAM), which achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions which are based on Fisher information and dynamic sparse training, respectively.

make sharpness-aware minimization stronger, sharpness-aware minimization, sparsified perturbation approach, (3 more...)

Neural Information Processing Systems

Jan-18-2025, 21:02:32 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.62)
  - Neural Networks (0.62)