Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models

Open in new window