Sharpness-Aware Minimization
This post deals with a recent optimizing method for training neural networks described in the paper Sharpness-Aware Minimization for Efficiently Improving Generalization by P. Foret et al. (December 2020). Honestly, the first time I read about the paper details, I really thought the procedure therein described (or something similar) had already been explored many years before by tons of people… I was even surprised to read that it worked in some contexts. Modern models train through optimization methods relying just on the training loss. These models can easily memorize the training data and are prone to overfitting. They have more parameters than needed and this large number of parameters provides no guarantee of proper generalization to the test set.
Jan-24-2022, 14:10:31 GMT
- Technology: