Neglected Hessian component explains mysteries in sharpness regularization

Neural Information Processing Systems 

SAM can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We investigate this inconsistency and reveal its connection to the the structure of the Hessian of the loss.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found