Neglected Hessian component explains mysteries in sharpness regularization
–Neural Information Processing Systems
SAM can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We investigate this inconsistency and reveal its connection to the the structure of the Hessian of the loss.
Neural Information Processing Systems
Feb-18-2026, 15:13:05 GMT