Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization

Deng, Jiaxin, Pang, Junbiao, Zhang, Baochang, Wang, Tian

arXiv.org Artificial Intelligence 

However, SAM requires two forward In recent years, some research has been proposed to understand and backward operations in one iteration, which results in the generalization of DNNs [Keskar et al., 2017; SAM's optimization speed being only half that of SGD. In Zhang et al., 2021; Mulayoff and Michaeli, 2020; Andriushchenko some scenarios, dedicating twice the training time to achieve and Flammarion, 2022; Zhou et al., 2021; only a marginal improvement in accuracy may not strike an Zhou et al., 2022]. Several studies have verified the relationship optimal balance between accuracy and efficiency. For example, between flat minima and generalization error [Dinh et when SAM is employed to optimize WideResNet-28-10 al., 2017; Li et al., 2018; Jiang et al., 2020; Liu et al., 2020; on CIFAR-100, despite achieving a higher test accuracy than Sun et al., 2021]. Among these studies, Jiang et al. [Jiang et SGD (84.45% vs. 82.89%), the optimization speed is only al., 2020] explored over 40 complexity measures and demonstrated half that of SGD (343 imgs/s vs. 661 imgs/s), as illustrated in that a sharpness-based measure exhibits the highest Figure 1.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found