Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization

Deng, Jiaxin, Pang, Junbiao, Zhang, Baochang, Wang, Tian

Feb-24-2024–arXiv.org Artificial Intelligence

However, SAM requires two forward In recent years, some research has been proposed to understand and backward operations in one iteration, which results in the generalization of DNNs [Keskar et al., 2017; SAM's optimization speed being only half that of SGD. In Zhang et al., 2021; Mulayoff and Michaeli, 2020; Andriushchenko some scenarios, dedicating twice the training time to achieve and Flammarion, 2022; Zhou et al., 2021; only a marginal improvement in accuracy may not strike an Zhou et al., 2022]. Several studies have verified the relationship optimal balance between accuracy and efficiency. For example, between flat minima and generalization error [Dinh et when SAM is employed to optimize WideResNet-28-10 al., 2017; Li et al., 2018; Jiang et al., 2020; Liu et al., 2020; on CIFAR-100, despite achieving a higher test accuracy than Sun et al., 2021]. Among these studies, Jiang et al. [Jiang et SGD (84.45% vs. 82.89%), the optimization speed is only al., 2020] explored over 40 complexity measures and demonstrated half that of SGD (343 imgs/s vs. 661 imgs/s), as illustrated in that a sharpness-based measure exhibits the highest Figure 1.

artificial intelligence, iteration, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Feb-24-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found