Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Open in new window