When Flatness Does (Not) Guarantee Adversarial Robustness

Walter, Nils Philipp, Adilova, Linara, Vreeken, Jilles, Kamp, Michael

arXiv.org Artificial Intelligence 

Despite their empirical success, neural networks remain vulnerable to small, adversarial perturbations. A longstanding hypothesis suggests that flat minima, regions of low curvature in the loss landscape, offer increased robustness. While intuitive, this connection has remained largely informal and incomplete. By rigorously formalizing the relationship, we show this intuition is only partially correct: flatness implies local but not global adversarial robustness. To arrive at this result, we first derive a closed-form expression for relative flatness in the penultimate layer, and then show we can use this to constrain the variation of the loss in input space. This allows us to formally analyze the adversarial robustness of the entire network. We then show that to maintain robustness beyond a local neighborhood, the loss needs to curve sharply away from the data manifold. We validate our theoretical predictions empirically across architectures and datasets, uncovering the geometric structure that governs adversarial vulnerability, and linking flatness to model confidence: adversarial examples often lie in large, flat regions where the model is confidently wrong. Our results challenge simplified views of flatness and provide a nuanced understanding of its role in robustness. Despite their success across a wide range of tasks, neural networks remain notoriously brittle under adversarial perturbations. Small, often imperceptible changes to the input can dramatically alter a model's prediction. Understanding the structural properties that contribute to this vulnerability is central to building more robust systems. One property that has long attracted attention is the flatness of the loss surface. Earlier work suggested that flatter minima correlate with better generalization (Hochreiter & Schmidhuber, 1994; Jiang et al., 2019), however, the universality of this link remains an open question (Andriushchenko et al., 2023). Flatness also emerged as a potential indicator for adversarial robustness(Wu et al., 2020): a model whose loss landscape is locally flat in parameter space might resist small perturbations in input space. At first glance, this appears to be disconnected, since adversarial examples concern the change of the loss with respect to the input, while flatness quantifies the change with respect to the weights.