Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

Open in new window